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During  the  period  of  this  grant,  Jennifer  Hou  moved  from  Ohio  State  University  to  the  University  of  Illinois 
at  Urbana  Champaign.  Professor  Hou’s  portion  of  the  MURI  grant  was  transferred  from  Ohio  State 
University  to  the  University  of  Illinois  at  Urbana  Champaign.  She  was  promoted  from  Associate  Professor 
to  Professor  in  August  2005. 

Mary  Baker  left  Stanford  University  to  join  Hewlett  Packard  Laboratories  in  2003.  Elizabeth  Belding- 
Royer  at  the  University  of  California,  Santa  Barbara,  replaced  Mary  Baker  on  this  grant.  She  was  promoted 
from  Assistant  Professor  to  Associate  Professor  in  July  2005. 

Douglas  Schmidt  moved  from  the  University  of  California,  Irvine,  to  become  a  program  manager  at 
DARPA.  Subsequently,  he  moved  to  Vanderbilt  University  where  he  is  now  a  professor.  Raymond 
Klefstad  replaced  Douglas  Schmidt  at  the  University  of  California,  Irvine,  as  a  principal  investigator  on  this 
grant. 

2.  Project  Objectives 

The  scientific  goal  of  this  AFOSR  MURI  effort  is  to  advance  the  state-of-the-art  in  design,  implementation, 
and  evaluation  of  network  protocols  to  meet  the  needs  of  complex  distributed  systems.  More  specifically, 
the  research  has  focused  on  (i)  the  design  and  evaluation  of  protocols  for  real-time  performance,  fault 
tolerance  and  security,  as  well  as  protocols  for  mobile  wireless  environments,  (ii)  the  design  and 
implementation  of  analysis/simulation  tools  to  validate/evaluate  protocols,  and  (iii)  the  development  of 
strategies  and  middleware  to  ease  the  programming  of  complex  networked  applications. 

The  ultimate  objectives  are  to 

•  Design  network  protocols  and  environments  that  allow  the  programmers  to  create  their 
applications  without  having  to  worry  about  timing,  fault  tolerance,  mobility  or  security; 

•  Develop  network  protocols  whose  behavior  is  predictable,  even  under  adverse  conditions  or  in  the 
absence  of  complete  information; 

•  Develop  design/analysis/verification  tools  that  help  developers  of  defense  and  other  applications  to 
design  their  network-centric  systems  and  to  understand  effects  of  their  design  decisions;  and 

•  Devise  methodologies  that  allow  application  designers  and  programmers  to  express,  to  other 
designers,  to  toolsets,  or  to  operating  systems,  the  need  of  applications  in  terms  of  functionality, 
real-time  performance,  and  fault-tolerant  operations. 

In  this  AFOSR  MURI  grant,  the  principal  investigators  focused  their  efforts  on  eight  different  research 
topics,  namely:, 

•  Protocols  for  Supporting  Real  Time  Resource  Management 

•  Protocols  and  Infrastructure  for  Supporting  Fault  Tolerance 

•  Methodology  and  Protocols  for  Supporting  Security 

•  Protocols  for  Supporting  Scalable  Real-Time  Multicasts 

•  Protocols  for  Supporting  Mobile  Wireless  Environments 

•  Bio-Networking  Architectures 

•  Middleware  for  Supporting  Application  Programs  with  End-to-End  QoS 

•  Tools  for  Supporting  Network  Simulation/Emulation. 

We  summarize  the  major  accomplishments,  scientific  research,  and  impact  related  to  these  research  topics 
below.  We  also  report  on  education  and  student  training,  publications,  presentations,  other  professional 
activities,  technology  transfer,  patent  disclosures,  software  releases,  and  honors  during  the  period  supported 
by  this  AFOSR  MURI  grant. 
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3.  Major  Accomplishments 


3.1  Protocols  for  Supporting  Real  Time  Resource  Management 

•  Enhanced  the  Real  Time  CORBA  support  in  the  TAO  Object  Request  Broker  (Schmidt). 

•  Designed  and  implemented  an  infrastructure,  based  on  the  TAO  CORBA  ORB,  developed  by 
Schmidt  and  Suda  at  UC  Irvine,  that  integrates  load  balancing  and  fault  tolerance  (Melliar-Smith 
and  Moser). 

•  Leveraged  long-range  dependency  characteristics  in  Internet  traffic,  exploited  its  correlation 
structure  for  predicting  future  traffic  at  multiple  time  scales,  utilized  prediction  results  online  for 
resource  control  in  active  queue  management  and  congestion  control  in  TCP  (Hou). 


3.2  Protocols  and  Infrastructure  for  Supporting  Fault  Tolerance 

•  Designed  and  implemented  an  interception-based  fault-tolerant  CORBA  infrastructure,  called  the 
Eternal  system.  The  Eternal  system  intercepts  the  HOP  messages  that  would  have  been  sent  by  the 
CORBA  ORB  and  instead  multicasts  them  to  the  replicas  of  an  object  using  a  reliable  totally- 
ordered  multicast  protocol  (Melliar-Smith  and  Moser). 

•  Designed  and  implemented  a  pluggable  fault-tolerant  CORBA  infrastructure,  called  the  Pluggable 
Fault-Tolerant  infrastructure,  that  provides  fault  tolerance  for  CORBA  applications  by  utilizing  the 
Pluggable  Protocols  Framework  (Melliar-Smith  and  Moser). 

•  Developed  the  Totem  Redundant  Ring  Protocol  that  supports  three  different  network  replication 
strategies:  Active,  Passive,  and  a  hybrid  Active-Passive  network  replication  strategy  (Melliar- 
Smith  and  Moser). 

•  Designed  and  implemented  the  Aroma  system,  which  provides  transparent  fault  tolerance  for  Java 
applications  that  use  Java  Remote  Method  Invocation  (Moser  and  Melliar-Smith). 

•  Designed  and  implemented  a  software  infrastructure  that  unifies  transactions  and  replication  for 
three-tier  architectures  and,  thus,  provides  high  availability  and  fault  tolerance  for  enterprise 
applications.  (Moser  and  Melliar-Smith). 

•  Designed  and  evaluated  a  reservation  protocol  strategy  for  business  activities  that  span  multiple 
enterprises,  such  as  Web  Services  (Moser  and  Melliar-Smith). 


3.3  Methodology  and  Protocols  for  Supporting  Security 

•  Developed  a  formal  framework  to  analyze  and  improve  detection  rules  of  intrusion  detection 
systems.  Validation  of  the  model  and  the  framework  are  established  using  the  System  Health  and 
Intrusion  Monitoring  (SHIM)  intrusion  detection  system  (Levitt). 

•  Developed  a  formal  framework  to  analyze  and  improve  the  specification-based  intrusion  detection 
mles  developed  for  an  auto-configuration  protocol,  i.e.,  the  Dynamic  Registration  and 
Configuration  Protocol  (DRCP),  used  by  mobile  ad  hoc  networks  (Levitt). 

•  Developed  a  distributed,  automated  response  model  that  utilizes  a  Proportional-Integral-Derivative 
controller  to  aid  in  handling  network  traffic  flow  management.  This  response  model  is  designed  to 
address  Distributed  Denial  of  Service  attacks  that  exploit  the  availability  of  servers  and  routers, 
resulting  in  the  severe  loss  of  their  connectivity  (Levitt). 
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•  Developed  a  specification-based  intrusion  detection  system  for  detecting  attacks  on  the  Ad  Hoc  On 
Demand  Distance  Vector  (AODV)  Routing  protocol,  designed  for  mobile  ad  hoc  networks 
(Levitt). 

•  Developed  a  specification-based  intrusion  detection  system  for  detecting  attacks  on  the  Optimized 
Link  State  Routing  (OLSR)  protocol,  designed  for  mobile  ad  hoc  networks  (Levitt). 

•  Developed  an  intrusion  monitor  for  the  Dynamic  Host  Configuration  Protocol  (DHCP).  This 
monitor  is  a  plug-in  preprocessor  for  Snort  that  detects  actual  behavior  of  network  elements 
violating  the  security  specifications  that  were  developed  in  studying  the  behavior  of  DHCP 
(Levitt). 

•  Developed  a  general  automatic  response  model  for  mobile  ad  hoc  networks  (Levitt). 

•  Developed  a  distributed  event-driven  message  exchange  model  for  intrusion  detection  within  a 
mobile  ad  hoc  networking  environment  (Levitt). 

•  Investigated  the  completeness  and  accuracy  properties  of  unreliable  fault  detectors  for 
asynchronous  distributed  systems  that  are  subject  to  Byzantine  faults  (Moser  and  Melliar-Smith). 

•  Won  the  Wilkes  Award  from  the  British  Computer  Society  for  the  paper,  “Unreliable  Byzantine 
Fault  Detectors  for  Solving  Consensus,”  published  in  the  Computer  Journal  (Moser  and  Melliar- 
Smith). 

•  Development  of  a  classification  model  for  attacks  against  ad  hoc  networks.  Our  model  is 
represented  as  an  attack  cuboid.  The  axes  of  the  cuboid  are  based  on  the  security  goals,  the 
network  layer  the  attack  targets,  and  the  type  of  attack.  All  currently  known  attacks  can  be 
classified  using  this  model  (Belding-Royer). 

•  Development  of  AODVSTAT,  a  tool  for  detecting  local  attacks  in  ad  hoc  networks.  Currently, 
five  different  attacks  can  be  detected  by  intrusion  detection  sensors  running  AODVSTAT 
(Belding-Royer). 

•  Development  of  Distributed  AODVSTAT,  a  distributed  tool  for  performing  global  intrusion 
detection  in  an  ad  hoc  network.  IDS  agents  running  D-AODVSTAT  communicate  and  correlate 
the  information  they  collect  to  detect  more  general  attacks  against  the  network.  Currently,  the  false 
propagation  of  messages  attack  can  be  detected  using  this  tool  (Belding-Royer). 

•  Testing  and  performance  evaluation  of  AODVSTAT  and  D-AODVSTAT  within  an  ad  hoc 
network  testbed.  The  testbed  consists  of  eight  laptops.  The  six  defined  attacks  were  implemented 
against  this  network  in  a  series  of  experiments.  Results  show  that  over  98%  of  the  attacks  are 
detected  with  approximately  2%  false  positives  (Belding-Royer) 

•  Development  of  the  emulation  system  that  enables  the  evaluation  of  AODVSTAT  and  D- 
AODVSTAT  in  large,  diverse  networks.  Four  attacks  were  simulated  in  networks  as  large  as  50 
nodes  in  a  series  of  experiments.  A  variety  of  different  movement  patterns  and  network  sizes  were 
evaluated  (Belding-Royer) 

•  Development  of  the  Authenticated  Routing  for  Ad  hoc  Networks  (ARAN)  protocol  that  detects 
and  protects  against  malicious  actions  by  third  parties  and  peers  in  ad  hoc  environments  (Belding- 
Royer). 

•  Design  of  a  classification  scheme  that  identifies  different  environments/scenarios  and  their 
associated  security  requirements  (Belding-Royer). 
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3.4  Protocols  for  Supporting  Scalable  Real-Time  Multicasts 

•  Produced  designs  of  new  scalable,  real-time  video  multicast  protocols,  and  produced  simulation 
models  and  simulators,  to  investigate  characteristics  of  the  protocols  (Suda). 

•  Designed  a  general  temporal  QoS  framework  for  QoS-driven  shared  tree  multicast  routing  and 
devised  eligibility  tests  for  member  join/leave  procedures  (Hou). 

•  Developed  a  multicast  group  communication  engine  and  bridge  that  allows  the  use  of  multiple 
group  communication  protocols  concurrently  (Moser  and  Melliar-Smith). 

•  Based  on  insights  derived  from  experimental  measurements  of  the  Pluggable  Fault-Tolerant 
Infrastructure,  we  also  developed  a  new  replication  strategy  using  semi-active  replication  and  low- 
latency  real-time  multicasts  (Melliar-Smith  and  Moser). 

•  Won  the  best  paper  award  at  the  International  Symposium  on  Performance  Evaluation  of 
Computer  and  Telecommunication  Systems  for  the  paper,  “End-to-end  latency  analysis  and 
evaluation  of  a  fault-tolerant  CORBA  infrastructure”  (Melliar-Smith  and  Moser). 


3.5  Protocols  for  Supporting  Mobile  Wireless  Environments 

•  Investigated  participation  incentives  for  proper  routing  in  ad  hoc  networks.  In  particular,  we  have 
developed  two  general  types  of  routing  enhancements  to  detect  and  handle  misbehaving  nodes 
(Baker). 

•  Examined  the  queuing  dynamics  of  nodes  in  ad  hoc  networks  across  a  wide  mobility  spectrum  in 
both  on-demand  and  pro-active  protocols  and  studied  how  to  improve  the  performance  of  ad  hoc 
networks  through  innovative  packet  scheduling  (Baker). 

•  Developed  a  number  of  novel  schemes  to  ensure  loop  freedom  in  on  demand  routing  protocols 
(Garcia-Luna-Ace  ves) . 

•  Designed  and  analyzed  a  new  type  of  routing  protocols  for  ad  hoc  networks  that  combines  on 
demand  and  proactive  routing  modalities  (Garcia-Luna-Aceves). 

•  Developed  a  new  analytical  model  for  MAC  protocols  in  multihop  ad  hoc  networks  applicable  to 
omni-directional  and  directional  antennas  (Garcia-Luna-Aceves). 

•  Developed  a  new  MAC  protocol  for  ad  hoc  networks  based  on  directional  antennas  that  permits 
network  nodes  to  schedule  their  transmissions  to  avoid  collisions  (Garcia-Luna-Aceves). 

•  Designed  and  analyzed  the  Hybrid  Activation  Multiple  Access  (HAMA),  a  new  type  of  medium 
access  control  protocol  for  ad  hoc  networks  capable  of  scheduling  the  transmission  of  unicast  and 
broadcast  traffic  using  two-hop  neighborhood  information  (Garcia-Luna-Aceves). 

•  Developed  a  novel  transmission  scheduling  approach  for  ad  hoc  networks  based  on  two-hop 
neighborhood  information  that  is  fair  and  can  be  applied  to  node  or  link  activation  schemes 
(Garcia-Luna-Aceves) . 

•  Developed  a  novel  scheme  for  topology  control  in  ad  hoc  networks  that  is  based  on  two-hop 
neighborhood  information  (Garcia-Luna-Aceves). 

•  Developed  novel  schemes  for  broadcasting  of  information  in  ad  hoc  networks  (Garcia-Luna- 
Aceves). 

•  Devised  a  Minimum  Spanning  Tree  topology  control  algorithm,  called  Local  Minimum  Spanning 
Tree,  and  a  variation  (to  support  k-connectivity  in  heterogeneous  environments)  for  wireless  multi¬ 
hop  networks,  and  analytically  proved  several  important  properties  of  this  algorithm  (Hou). 

•  Studied  the  issue  of  power  management  and  investigated  the  tradeoffs  between  reducing  the  energy 
consumption  and  minimizing  the  performance  penalty.  Explored  the  design  space  and  devised  a 
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suite  of  power  management  protocols  from  asynchronous  to  synchronous,  reactive  to  proactive, 
and  agnostic  to  cross-layer  information  to  utilizing  hints  from  the  higher  layers  (Hou). 

•  Performed  a  rigorous  study  of  performance  limits  for  wireless  sensor  networks  with  respect  to 
coverage,  connectivity,  lifetime,  and  power,  and  designed  protocols  that  aim  to  achieve  these 
performance  limits  (Hou). 

•  Produced  designs  of  new  protocols  for  sensor  networks,  and  produced  simulation  models  and 
simulators,  to  investigate  characteristics  of  the  protocols  (Suda). 

•  Developed  a  natural  language  interface  and  voice-enabled  applications  for  mobile  wireless 
personal  handheld  devices  (Melliar-Smith  and  Moser). 

3.6  Bio-Networking  Architecture 

•  Implementated  in  software  the  Bio-Networking  Architecture  and  its  associated  protocols  and 
empirically  evaluated  the  Bio-Networking  Architecture  protocols  (Suda) 

•  Produced  designs  of  protocols  for  discovery  in  peer-to-peer  systems,  based  on  the  Bio-Networking 
Architecture,  and  produced  simulation  models  and  simulators,  to  investigate  characteristics  of  the 
protocols  (Suda). 

3.7  Middleware  for  Supporting  Application  Programs  with  End-to-End  QoS 

•  Developed  a  pluggable  protocol  framework  for  the  TAO  ORB  (Schmidt). 

•  Added  support  for  key  QoS-enabled  network  protocols  (including  IntServ,  DiffServ,  and  SCTP)  to 
TAO’s  pluggable  protocols  framework  (Schmidt). 

•  Designed  and  implemented  Java-based  CORBA  middleware,  called  ZEN,  that  is  highly  optimized 
for  real-time  performance.  ZEN  has  a  modular  and  pluggable  architecture  that  enables  the 
configuration  of  transport  protocols,  object  adapters,  and  ORB  core,  in  particular  for  small  code 
size  (Klefstad  and  Schmidt). 

•  Developed  RTZen,  an  RTSJ  (Real-Time  Specification  for  Java)  implementation  of  real-time 
CORBA  and  the  successor  of  ZEN.  In  addition  to  providing  the  benefits  of  ZEN,  it  provides  much 
higher  predictability  and  throughput.  We  also  discovered  design  patterns  for  facilitating  the 
development  of  RTSJ-based  middleware  and  developed  a  tool  that  visualizes  the  scoped 
hierarchies  of  complex  applications  and  that  also  locates  memory  leaks  (Klefstad). 

•  Developed  ZEN-kit,  a  graphical  tool  for  customizing  RTZen.  This  customization  is  achieved 
through  modularizing  the  middleware  so  that  features  may  be  inserted  or  removed  based  on  the 
application  requirements  (Klefstad). 

•  Evaluated  the  performance  of  Java-based  CORBA  Component  Model  technology  for  real-time  and 
embedded  systems  (Klefstad). 

•  Received  the  Best  Paper  Award  at  the  36th  IEEE  Hawaii  International  Conference  on  System 
Science  for  our  paper  titled  “Design  and  performance  of  a  dynamically  configurable,  messaging 
protocols  framework  for  real-time  CORBA”  (Klefstad  and  Schmidt). 

•  Designed  and  implemented  an  infrastructure,  based  on  the  TAO  CORBA  ORB,  developed  by 
Schmidt  and  Suda  at  UC  Irvine,  that  integrates  load  balancing  and  fault  tolerance  (Melliar-Smith 
and  Moser). 

Produced  designs  of  protocols  for  inter-domain  resource  exchange  (iREX),  and  produced 
simulation  models  and  simulators,  to  investigate  characteristics  of  the  protocols  (Suda). 
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3.8  Tools  for  Supporting  Network  Simulation/Emulation 

•  Developed  simulation  techniques  for  very  large  application-level  distributed  protocols  and  began 
implementation  of  Narses  using  network  models  of  varying  levels  of  details  (Baker). 

•  Completed  the  implementation,  evaluation,  and  public  release  of  a  component-based, 
compositional  network  simulation  and  emulation  environment,  called  J-Sim.  J-Sim  supports 
network  simulation  in  both  wired  and  wireless  environments,  including  wireless  sensor  networks 
(Hou). 

4.  Scientific  Research 

To  achieve  the  project  objectives,  we  have  worked  synergistically  on  the  following  research  topics: 

4.1  Protocols  for  Supporting  Real  Time  Resource  Management 

Exploitation  of  LRD  in  Internet  Traffic  for  Resource  and  Traffic  Control  (Hou):  In  several  studies, 
Internet  traffic  has  been  shown  empirically  to  be  long-range  dependent  and  self-similar.  These 
characteristics  imply  Internet  traffic  exhibits  bursty  behaviors  across  multiple  time  scales  and,  hence, 
network  resources  (such  as  bandwidth  and  buffer  space)  have  to  be  over-provisioned  according  to  peak 
traffic  load  (rather  than  average  traffic  load)  in  order  to  provide  QoS.  On  the  flip  side,  these  characteristics 
imply  that  there  exist  correlation  structures  across  multiple  time  scales  that  can  be  leveraged  in  traffic 
prediction  and  network  planning.  Central  to  the  exploitation  of  the  abundant  correlation  structure  for  traffic 
control  and  measurement  are  (i)  the  prediction  of  future  traffic  on  multiple  timescales  based  on  recent 
measurements,  and  (ii)  use  of  the  prediction  results  in  the  decision  making  of  traffic  control/measurement. 
We  have  carried  out  research  along  two  major  directions.  First,  we  investigated  both  fractional-model-based 
and  non-fractional-model-based  predictors  and  determine  which  predictor(s)  render  the  best  performance  in 
predicting  Internet  traffic.  The  fractional  brownian  motion  model,  the  fractal  ARIMA  model,  and  the 
multifractal  wavelet  model  are  representatives  of  the  former,  and  the  linear  minimum  mean  square  error 
predictor  fall  in  the  latter.  This  study  employed  performance  measures  of  accuracy,  ease  of  deployment, 
computational  complexity,  and  adaptability.  It  was  based  on  analytical  reasoning,  ns-2  simulation,  and 
empirical  experiments  with  the  large  amount  of  traffic  measurement  and  traces  available  from  the 
Cooperative  Association  for  Internet  Data  Analysis.  Second,  we  have  investigated  how  to  harness  traffic 
prediction  (obtained  with  appropriate  choices  of  traffic  predictors)  in  traffic  control/measurement  for  the 
three  research  thrust  areas  discussed  below. 

Leveraging  Traffic  Predictability  in  Active  Queue  Management  (Hou):  We  have  shown  that  the 
correlation  structure  present  in  long-range  dependent  traffic  can  be  detected  online  and  used  to  predict 
future  traffic.  We  then  devised  a  novel  scheme,  called  TCP  with  traffic  prediction,  that  exploits  the 
prediction  results  to  infer,  in  the  context  of  AIMD  steady-state  dynamics,  the  optimal  operational  point  at 
which  a  TCP  connection  should  operate.  Through  analytical  reasoning,  we  have  shown  that  the  impact  of 
prediction  errors  on  fairness  is  minimal.  We  have  also  conducted  ns-2  simulation  and  FreeBSD  4.1 
implementation  studies  to  validate  the  design  and  to  demonstrate  the  performance  improvements  in  terms  of 
the  packet  loss  ratio  and  throughput  attained  by  connections. 

Exploration  of  the  Use  of  Traffic  Predictability  to  Improve  TCP  Congestion  Control  (Hou):  With  the 
objective  of  stabling  the  instantaneous  queue  length,  we  have  incorporated  the  prediction  results  in  the 
calculation  of  the  packet  dropping  probability  in  active  queue  management  (AQM).  We  call  the  resulting 
scheme  Predictive  AQM  (PAQM).  Through  analytical  reasoning,  we  have  shown  that  PAQM  is  a 
generalized  version  of  the  well-known  AQM  scheme,  random  early  detection  (RED)  that  takes  the  future 
arrival  rate  as  a  new  dimension  of  congestion  index.  By  stabilizing  the  queue  at  a  desirable  level  with 
consideration  of  future  traffic,  PAQM  enables  the  link  capacity  to  be  fully  utilized,  while  not  incurring 
excessive  packet  loss  ratio.  Using  ns-2  simulation,  we  have  compared  PAQM  against  existing  AQM 
schemes  with  respect  to  different  performance  criteria,  and  have  shown  that  under  most  cases  PAQM 
outperforms  RED  in  stabilizing  the  instantaneous  queue  length  and  adaptive  virtual  queue  in  reducing  the 
packet  loss  ratio  and  achieving  higher  utilization  of  the  link  capacity. 
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Exploitation  of  LRD  for  Resource  and  Traffic  Control  (Hou):  We  have  investigated  three  theoretically 
grounded  packet-pair  methods,  prediction,  reconstruction  and  interpolation,  for  measuring  cross  traffic  on 
the  bottleneck  link  of  an  end-to-end  path.  Essentially  back-to-back,  closely-spaced  packet  pairs  are  sent  at  a 
sender,  and  the  inter-arrival  times  of  packets  in  each  pair  are  measured  at  the  receiver  and  are  used  to  infer 
the  amount  of  cross  traffic  at  the  bottleneck  link.  The  objective  is  to  infer  cross  traffic  as  accurately  as 
possible,  while  not  injecting  a  significant  number  of  probe  packets  into  the  network.  To  achieve  this 
objective,  we  take  advantage  of  the  LRD  characteristic  of  cross  traffic  and  take  measurements  (i.e.,  send 
back-to-back  packet  pairs)  only  at  time  instants  strategically  determined  by  LMMS  prediction,  signal 
reconstruction,  and  interpolation.  We  have  conducted  simulation/empirical  (Internet)  studies  to  study  (i) 
whether  these  methods  give  good  mean  or  instantaneous  measurements  of  cross  traffic,  and  (ii)  whether  they 
are  adaptive  to  dynamic  changes  of  cross  traffic  and  are  robust  in  the  presence  of  multiple  bottleneck  links 
on  an  end-to-end  path. 


Real  Time  CORBA  Support  (Schmidt):  Developers  of  mission  critical  distributed  real-time  applications 
have  historically  used  low-level,  non-standard  network  programming  interfaces,  application-specific 
protocols,  and  customized  real-time  scheduling  mechanisms.  This  legacy  approach  mixes  real-time  network 
programming  throughout  application  programs,  which  makes  their  development  non-portable,  tedious,  and 
error-prone,  and  requires  highly  skilled  developers.  To  address  these  problems,  we  have  developed  and 
optimized  the  TAO  Object  Request  Broker  (ORB).  TAO  is  an  open-source,  standards-compliant 
implementation  of  the  Real  Time  CORBA  specification  that  is  being  used  on  hundreds  of  research  projects 
and  commercial  products  world-wide.  TAO  is  designed  to  meet  end-to-end  application  QoS  requirements 
by  vertically  integrating  distributed  object  computing  middleware  with  OS  I/O  subsystems,  network 
protocols,  and  network  interfaces. 


TAO  was  the  first  real-time  ORB  end  system  to  support  end-to-end  QoS  guarantees  over  high-speed 
networks  (such  as  ATM),  embedded  system  interconnects  (such  as  VME  and  Fibrechannel),  and  QoS- 
enabled  network  protocols  (such  as  IntServ,  DiffServ,  and  SCTP). 


The  following  is  a  synopsis  of  the  contributions  of  the  Real  Time  CORBA  portion  of  the  PERC  project: 

•  An  ORB  core  that  supports  deterministic  real-time  scheduling  and  dispatching  strategies.  The 
TAO  ORB  core  concurrency  models  minimize  context  switching,  synchronization,  dynamic  memory 
allocation,  and  data  movement. 

•  An  active  demultiplexing  strategy  that  associates  client  requests  with  target  objects  in  constant 
time,  regardless  of  the  number  of  objects  and  operations. 

•  A  highly-optimized  CORBA  GIOP  protocol  engine  and  an  IDL  compiler  that  generates  compiled 
or  interpreted  stubs  and  skeletons,  that  enables  applications  to  make  fine-grained  time/space  tradeoffs. 

Load  Balancing  and  Fault  Tolerance  (Melliar-Smith  and  Moser):  Aditya  Singh,  Michael  Melliar-Smith 
and  Louise  Moser  developed  an  infrastructure,  based  on  the  TAO  CORBA  ORB,  developed  by  Schmidt 
and  Suda  at  UC  Irvine,  that  integrates  load  balancing  and  fault  tolerance. 


Peer  Group 


Replication  for  Fault  Tolerance 


Integration  of  fault  tolerance  and  load  balancing,  where  peer  groups  are  used  for  load  balancing  and  replica 
groups  are  used  for  fault  tolerance. 
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4.2  Protocols  and  Infrastructure  for  Supporting  Fault  Tolerance 

Interception-Based  Fault  Tolerant  CORBA  Infrastructure  (Moser  and  Melliar-Smith):  Professors 
Moser  and  Melliar-Smith  and  their  students  have  designed  and  implemented  the  Eternal  system  that 
provides  fault  tolerance  for  CORBA  applications  using  the  interception  approach.  The  Eternal  system 
intercepts  the  HOP  messages  that  would  have  been  sent  by  the  CORBA  ORB  and,  instead,  multicasts  them 
to  the  replicas  of  an  object  using  a  reliable  totally-ordered  multicast  protocol.  The  Eternal  system  filters 
duplicate  operations,  and  controls  the  dispatching  of  operations  to  the  multiple  threads  of  an  object.  It  logs 
messages  and  checkpoints,  and  transfers  the  state  of  a  replica  to  a  new  or  recovering  replica,  and  logs  and 
replays  messages. 


The  components  of  the  Eternal  system. 

Design  and  Implementation  of  a  Pluggable  Fault  Tolerant  CORBA  Infrastructure  (Moser  and 
Melliar-Smith):  Wenbing  Zhao,  Louise  Moser  and  Michael  Melliar-Smith  have  developed  a  Pluggable 
Fault-Tolerant  CORBA  infrastructure  (Pluggable  FT)  that  provides  fault  tolerance  for  CORBA  applications 
by  utilizing  the  Pluggable  Protocols  Framework  (PPF)  that  is  available  for  most  CORBA  ORBs.  The  fault 
tolerance  mechanisms,  except  for  the  Totem  group  communication  protocol,  are  integrated  into  the 
application  process's  address  space  using  the  PPF.  The  approach  does  not  require  modification  to  the 
CORBA  ORB,  and  requires  only  minimal  modification  to  the  application.  Moreover,  it  avoids  the  difficulty 
of  retrieving  and  assigning  the  ORB  state,  by  incorporating  the  fault  tolerance  mechanisms  into  the  ORB. 

The  Pluggable  Protocols  Framework  separates  the  messaging  and  network  protocols  from  other  parts  of  the 
ORB  core  and  from  the  application  objects.  The  PPF  allows  both  the  network  protocol  and  the  messaging 
protocol  to  be  replaced.  Thus,  the  PPF  makes  it  possible  to  develop  CORBA  applications  for,  and  to  deploy 
them  in,  environments  for  which  the  standard  IIOP  protocol  is  not  appropriate.  The  PPF  also  provides 
CORBA  applications  with  improved  quality  of  service  by  enabling  the  use  of  customized  protocols  that  are 
tailored  to  those  applications  and  their  environments. 

In  the  Pluggable  Protocols  Infrastructure,  we  use  the  PPF  to  replace  the  network  protocol  with  the  Totem 
reliable  totally-ordered  multicast  protocol.  The  infrastructure  achieves  performance  that  is  better  than  that 
of  existing  Fault  Tolerant  CORBA  systems,  while  providing  strong  replica  consistency  and  supporting  a 
wider  range  of  applications. 

Totem  Redundant  Ring  Protocol  (Melliar-Smith  and  Moser):  Ruppert  Koch,  Michael  Melliar-Smith 
and  Louise  Moser  have  developed  the  Totem  Redundant  Ring  Protocol  (Totem  RRP)  that  supports  three 
different  network  replication  strategies:  Active,  Passive  and  a  hybrid  Active-Passive  network  replication 
strategy. 

In  active  replication,  all  messages  and  tokens  are  sent  over  all  networks  at  the  same  time.  All  data  are 
received  multiple  times.  The  bandwidth  consumption  increases  linearly  with  the  number  N  of  networks.  The 
maximum  throughput  equals  the  throughput  of  the  slowest  network.  The  system  is  able  to  mask  the  loss  of  a 
message  on  up  to  N-l  networks  without  any  message  retransmission  delay.  In  passive  replication,  messages 
are  sent  alternately  over  one  of  the  available  networks.  The  bandwidth  consumption  equals  the  bandwidth 
consumption  of  an  unreplicated  system.  In  the  fault-free  case,  the  maximum  throughput  equals  the  sum  of 
the  throughputs  of  all  networks.  If  one  of  the  networks  fails,  the  maximum  throughput  is  reduced.  If  a 
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message  is  lost,  Totem  must  wait  until  the  message  has  been  retransmitted.  Active-passive  replication  is  a 
mixture  of  active  replication  and  passive  replication.  Every  message  or  token  is  sent  over  K  networks 
simultaneously,  1<K<N.  The  bandwidth  consumption  increases  K-fold.  The  system  is  able  to  mask  the  loss 
of  a  message  on  up  to  K-l  networks  without  any  message  retransmission  delay. 

Performance  measurements  show  that  a  system  using  the  Totem  RRP  with  Active  replication  experiences  a 
decrease  in  effective  throughput,  as  is  to  be  expected.  In  contrast,  in  a  system  that  uses  the  Redundant  Ring 
Protocol  with  Passive  replication,  the  throughput  exceeds  the  throughput  of  the  unreplicated  system  while 
being  more  resilient  to  network  faults. 

Latency  of  Replication  and  Multicast  Protocols  (Melliar-Smith  and  Moser):  Wenbing  Zhao,  Louise 
Moser  and  Michael  Melliar-Smith  have  performed  experimental  measurements  of  the  Pluggable  Fault 
Tolerant  CORBA  Infrastructure  and  a  reliable  totally-ordered  multicast  group  communication  protocol.  We 
measured  probability  density  functions  for  the  latency  from  a  request  transmitted  by  a  client,  through 
service  by  a  (possibly  replicated)  server,  to  reply  delivered  to  the  client.  These  measurements  show  that, 
while  the  logical  token  ring  protocol  does  achieve  excellent  throughput,  it  has  an  adverse  effect  on  latency. 
The  120  microsecond  latency  measured  for  an  unreplicated  server  is  increased  to  600  to  900  microsecond 
for  a  three-way  replicated  server.  With  an  alternative  rotating  sequencer  protocol,  the  latency  is  still  about 
400  microseconds  for  a  three-way  replicated  server.  This  increased  latency  is  not  significant  in  enterprise 
applications  operating  over  a  wide-area  network,  but  is  undesirable  in  real-time  embedded  applications. 

Based  on  insights  derived  from  these  measurements,  we  have  developed  a  new  replication  strategy  using 
semi-active  replication.  Messages  are  processed  immediately  at  the  primary  replica  and  that  message  order 
is  communicated  to  the  backup  replicas  so  that  they  can  mimic  the  behavior  of  the  primary  replicas. 
Considerable  care  is  required  to  ensure  that  a  faulty  processor  coupled  with  the  loss  of  messages  cannot 
leave  the  system  in  an  inconsistent  state.  Preliminary  measurements  show  a  latency  of  170  microseconds, 
which  we  hope  to  reduce  to  about  150  microseconds. 

Hub  Implemented  Protocols  (Melliar-Smith  and  Moser):  Ruppert  Koch,  Michael  Melliar-Smith  and 
Louise  Moser  have  investigated  novel  implementation  strategies  for  fault  tolerant  protocols.  There  have 
been  recent  efforts  to  move  the  protocol  stack  from  the  host  computer  to  the  network  interface  card.  We  are 
investigating  moving  parts  of  the  protocol  stack  one  step  further  to  the  Ethernet  switch.  It  is  still  necessary 
to  operate  a  simple  protocol  between  the  switch  and  the  host  computer,  currently  TCP,  but  many  of  the 
difficult  issues  are  handled  more  easily  at  the  switch,  including  multicasting,  message  ordering, 
retransmission,  and  flow  control.  The  primary  challenge  is  whether  it  is  possible  to  achieve  adequate 
performance  from  the  Ethernet  switch  while  it  performs  these  additional  functions.  Another  challenge  is 
replication  of  the  Ethernet  switch  to  avoid  a  single  point  of  failure,  while  still  ensuring  consistency  of  the 
state  of  the  two  switches. 

Fault  Tolerance  for  Three-Tier  Architectures  (Moser  and  Melliar-Smith):  Wenbing  Zhao,  Louise 
Moser  and  Michael  Melliar-Smith  have  developed  a  software  infrastructure  that  unifies  transactions  and 
replication  for  three-tier  architectures  and,  thus,  provides  high  availability  and  fault  tolerance  for  enterprise 
applications.  The  infrastructure  is  based  on  the  Fault  Tolerant  CORBA  and  CORBA  Object  Transaction 
Service  standards.  Within  the  middle-tier  at  the  server,  the  infrastructure  replicates  the  applications  to 
protect  the  business  logic  processing.  In  addition,  it  replicates  the  transaction  coordinator,  which  renders  the 
two-phase  commit  protocol  non-blocking  and,  thus,  avoids  potentially  long  service  disruptions  caused  by 
failure  of  the  coordinator.  The  infrastructure  implements  automatic  client-side  failover  mechanisms,  which 
guarantee  that  clients  know  the  outcome  of  the  requests  that  they  have  made.  Moreover,  it  handles  the 
interactions  between  the  applications  and  the  database  system  through  replicated  gateways  that  prevent 
duplicate  requests  from  reaching  the  database  system.  The  infrastructure  starts  the  transactions  at  the 
applications,  and  automatically  retries  aborted  transactions,  caused  by  process  or  communication  faults,  on 
behalf  of  the  clients. 
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The  fault  tolerance  infrastructure  that  unifies  transactions  and  replication. 


Evaluation  of  Fault  Detection  and  Recovery  Times  (Melliar-Smith  and  Moser):  Xiaoli  Chen,  Michael 
Melliar-Smith  and  Louise  Moser  developed  a  benchmark,  with  characteristics  generally  similar  to  NSWC's 
HiPerD  air  defense  application.  The  benchmark  consists  of  a  driver  and  a  measurement  process  on  one 
computer,  and  a  four-stage  pipeline  with  each  stage  implemented  on  a  different  computer.  Each  of  the  four 
stages  was  replicated  using  Eternal  System's  real-time  fault  tolerance  replication  infrastructure. 

Measurements  were  made  for  the  time  to  fault  detection  and  recovery  with  all  operations  performed  at  the 
same  priority  level,  and  with  two  priority  levels  thus  allowing  a  small  proportion  of  the  operations  to  be 
high  priority  operations.  In  our  experiments,  it  became  apparent  immediately  that  the  primary  determinant 
of  the  fault  detection  and  recovery  time  was  the  10  ms  resolution  of  the  Linux  clock. 

The  time  through  the  four-stage  pipeline  was  mean  2.654  ms,  maximum  12.867  ms,  without  replication  and 
fault  tolerance,  and  mean  3.783  ms,  maximum  12.441  ms,  with  primary/backup  replication  and  fault 
tolerance.  For  high  priority  operations,  the  average  time  was  3.878  ms,  the  maximum  4.909  ms.  The  fault 
detection  and  recovery  time  was  138  milliseconds,  which  is  too  long.  With  a  faster  clock,  the  fault 
detection  and  recovery  time  would  have  been  shorter. 

TCP  Connection  Failover  for  Fault  Recovery  (Melliar-Smith  and  Moser):  Ruppert  Koch,  Sanjay 
Hortikar,  Louise  Moser  and  Michael  Melliar-Smith  developed  a  protocol  that  allows  failover  of  a  TCP 
server  endpoint  in  a  transparent  manner.  The  failover  can  occur  at  any  time  during  the  lifetime  of  the 
connection,  and  is  achieved  by  modifying  the  server's  TCP/IP  stack.  No  modifications  are  required  to  the 
client  application,  the  server  application,  the  client's  TCP/IP  stack,  or  the  network. 

The  essence  of  the  problem  is  that  failover  must  be  achieved  at  one  end  of  the  TCP  connection  without 
disturbing  the  other  end,  which  might  be  external  to  the  system  and  thus  might  not  be  running  any  TCP 
failover  software.  In  particular,  the  new  TCP  server  endpoint,  taking  over  after  the  failure  of  the  original 
TCP  server  endpoint,  must  continue  using  the  same  sequence  numbers,  acknowledge  messages  already 
received,  and  retransmit  messages  for  which  acknowledgments  are  not  received. 

Thus,  the  standby  server  TCP  endpoint  must  receive  every  message  that  the  primary  server  TCP  endpoint 
transmits  or  receives,  which  is  achieved  by  having  the  primary  server  TCP  endpoint  forward  all  such 
messages  to  the  standby  server  TCP  endpoint.  Fortunately,  the  TCP  stack  contains  a  mechanism  for 
inserting  additional  modules  into  the  stack  that  can  be  used  to  achieve  this  effect.  The  TCP  stack  also 
contains  mechanisms  for  setting,  during  startup,  the  initial  sequence  numbers,  which  can  be  used  to  ensure 
that  the  standby  server  TCP  endpoint  uses  the  correct  sequence  numbers  when  it  takes  over. 

TCP  connection  failover  is,  of  course,  only  an  adjunct  to  failover  of  an  application  program.  If  the 
application  program  is  using  passive  replication  with  restart  from  a  checkpoint,  the  application  program 
needs  to  have  messages  replayed  from  a  log,  and  will  generate  messages  that  have  already  been  sent  and 
that  must  be  suppressed.  The  TCP  connection  failover  mechanisms  at  the  standby  server  TCP  endpoint 
include  mechanisms  for  buffering  and  logging  messages.  These  mechanisms  must  be  integrated  with 
message  logging  and  replay  mechanisms  for  the  application. 
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Consistent  File  Replication  (Melliar-Smith  and  Moser)  :  Rachit  Chawla,  Michael  Melliar-Smith  and 
Louise  Moser  investigated  strategies  for  consistent  file  replication  that  maintain  consistency  between  the 
states  of  files  and  the  states  of  the  application  programs  that  use  those  files,  thus  simplifying  recovery  from 
faults.  We  have  considered  four  application  availability  policies: 

•  Simple  restart 

•  Checkpoint  restore 

•  Passive  replication 

•  Active  replication. 

For  each  of  these  policies,  we  have  investigated  corresponding  file  replication  strategies. 

Consistent  Time  Service  for  Fault-Tolerant  Distributed  Systems  (Melliar-Smith  and  Moser):  Clock- 
related  operations  are  one  of  the  many  sources  of  replica  non-determinism  and  of  replica  inconsistency  in 
fault-tolerant  distributed  systems.  In  passive  replication,  if  the  primary  server  crashes,  the  next  clock  value 
returned  by  the  new  primary  server  might  have  actually  rolled  back  in  time,  which  can  lead  to  undesirable 
consequences  for  the  replicated  application.  The  same  problem  can  happen  for  active  replication  when  the 
result  of  the  first  replica  to  respond  is  taken  as  the  next  clock  value. 

Wenbing  Zhao,  Louise  Moser  and  Michael  Melliar-Smith  have  designed  and  implemented  a  Consistent 
Time  Service  for  fault-tolerant  distributed  systems.  The  Consistent  Time  Service  introduces  a  group  clock 
that  is  consistent  across  the  replicas  and  ensures  the  determinism  of  the  replicas  with  respect  to  clock- 
related  operations.  The  group  clock  is  monotonically  increasing,  transparent  to  the  application,  and  fault- 
tolerant.  The  Consistent  Time  Service  guarantees  the  consistency  of  the  group  clock  even  when  faults 
occur,  new  replicas  are  added  to  the  group,  and  failed  replicas  recover. 

A  Reservation  Protocol  for  Business  Activities  That  Span  Multiple  Enterprises  (Moser  and  Melliar- 
Smith):  Traditional  transaction  semantics  are  not  appropriate  for  business  activities  that  involve  long- 
running  transactions  in  a  loosely  coupled  distributed  environment,  in  particular,  for  Web  Services  that 
operate  between  different  enterprises  over  the  Internet.  Extended  transaction  models  have  been  developed 
to  coordinate  business  activities,  but  they  rely  on  compensating  transactions  to  achieve  data  consistency  and 
atomicity  of  tasks  in  the  presence  of  faults.  Even  though  the  state  change  resulting  from  a  particular 
committed  task  might  be  reversed  by  a  compensating  transaction,  it  is  difficult  to  cancel  the  side  effects  of  a 
committed  task  because  other  transactions  might  see  the  results  before  the  compensating  transaction  is 
applied.  Identifying  such  transactions  and  compensating  all  of  them  is  difficult. 

Wenbing  Zhao,  Louise  Moser  and  Michael  Melliar-Smith  have  developed  a  novel  reservation-based 
extended  transaction  protocol  that  can  be  used  to  coordinate  such  business  activities.  The  protocol  avoids 
the  use  of  compensating  transactions,  which  can  result  in  undesirable  effects.  In  our  protocol,  each  task 
within  a  business  activity  is  executed  as  two  steps.  The  first  step  involves  an  explicit  reservation  of 
resources.  The  second  step  involves  the  confirmation  or  cancellation  of  the  reservation.  Each  step  is 
executed  as  a  separate  traditional  short-running  transaction.  We  have  investigated  how  our  protocol  can  be 
implemented  as  a  reservation  protocol  on  top  of  the  Web  Services  Transaction  specification  or, 
alternatively,  as  a  coordination  protocol  on  top  of  the  Web  Services  Coordination  specification. 

Analysis  of  the  Reservation  Protocol  Strategy  (Melliar-Smith  and  Moser):  We  have  conducted  several 
availability  and  performance  analyses  in  order  to  evaluate  the  reservation  protocol  strategy  in  comparison 
with  the  abort  of  transactions  with  compensating  transactions  strategy  and  the  distributed  transactions  with 
two-phase  commit  strategy. 

The  probability  that  the  databases  are  left  in  a  potentially  inconsistent  state  is  an  important  availability 
metric.  The  figure  shows  the  probabilities  of  potential  inconsistency  for  (a)  the  abort  of  transactions  with 
compensating  transactions  strategy,  and  (b)  the  reservation  protocol  strategy.  The  reservation  protocol  has 
superior  performance  because  there  are  fewer  additional  transactions  for  each  fault  recovery.  It  is  our 
assessment  that  the  difference  in  the  probabilities  that  the  database  is  left  in  a  potentially  inconsistent  state 
presents  a  decisive  advantage  for  the  reservation  protocol  strategy.  If  the  distributed  transactions  with  two- 
phase  commit  strategy  is  used,  instead  of  the  abort  of  transactions  with  compensating  transactions  strategy, 
the  risk  of  inconsistency  is  reduced  but  there  is  an  increased  risk  that  a  fault  will  cause  data  to  be  locked  for 
an  arbitrarily  long  period  of  time. 
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The  probability  that  the  database  is  left  in  a  potentially  inconsistent  state  after  m  business  activities  for 
(a)  the  abort  of  transactions  with  compensating  transactions  on  the  left,  and  (b)  the  reservation  protocol  strategy 
on  the  right. 

The  probability  density  functions  (pdfs)  for  the  duration  of  a  business  activity  with  delays  due  to  lock 
contention  is  also  an  important  metric.  The  figure  shows  the  pdfs  for  the  duration  of  a  business  activity  for 
different  values  of  lock  contention  for  both  transactional  locking  and  the  reservation  protocol.  When  the 
probability  of  lock  contention  is  low,  the  pdfs  for  the  duration  of  a  business  activity  using  transactional 
locking  are  substantially  as  expected,  and  the  effects  of  delays  due  to  contention  for  a  single  lock  and  for 
two  locks  are  clearly  visible.  As  the  probability  of  contention  for  a  lock  increases,  the  business  activities 
are  delayed,  locks  are  held  longer,  delays  due  to  lock  contention  are  longer,  and  the  probability  that  a 
business  activity  claims  a  lock  that  is  already  held  by  another  business  activity  increases.  The  resulting  pdfs 
have  long  tails  and,  thus,  there  is  a  high  probability  of  lengthy  delays  for  the  business  activity.  It  is  worth 
noting  that,  for  transactional  locking,  there  are  probabilities  for  lock  contention  for  which  the  system  is  not 
stable,  representing  unbounded  delays  and  essentially  no  progress  for  the  business  activity.  Such  lock 
contention  and  instability  lead  to  system  collapse  under  heavy  load,  which  often  occurs  at  the  most 
inappropriate  times  during  the  most  important  tasks.  This  observation  underscores  the  importance  of 
determining,  and  enforcing,  an  admission  control  limit  for  the  business  activities. 

Also  shown  in  the  figure  are  pdfs  for  the  duration  of  a  business  activity  for  the  reservation  protocol.  It  is 
evident  that  even  high  probabilities  of  contention  for  locks  do  not  result  in  substantial  delays  for  the 
business  activity,  because  locks  are  held  only  briefly  during  the  reservation  subtask  and  are  not  held  for  the 
full  duration  of  the  business  activity.  In  summary,  the  reservation  protocol  strategy  is  more  resilient  to  high 
loads  and  high  probabilities  of  lock  contention  than  the  transactional  locking  strategy. 

If,  instead  of  distributed  transactions  with  two-phase  commit,  only  local  transactions  and  compensating 
transactions  are  used,  the  reservation  of  part  of  the  resource  is  simpler.  The  concurrency  achieved  is 
substantially  equivalent  to  that  of  the  reservation  protocol  strategy. 


The  probability  density  functions  for  the  duration  of  a  business  activity  with  delays  due  to  lock  contention. 

The  Heisenbug  Problem  (Melliar-Smith  and  Moser):  Transient  errors  of  unknown  cause,  sometimes 
called  Heisenbugs,  are  a  perplexing  problem  in  computer  systems.  Heisenbugs  are  significantly  more 
frequent,  up  to  30  times  more  frequent,  than  permanent  faults  or  Bohrbugs,  and  are  the  primary  determinant 
of  the  reliability  of  a  computer  system.  Because  Heisenbugs  are  difficult  to  reproduce,  it  is  difficult  to 
identify  the  cause  of  these  errors  and  to  remove  them  from  the  computer  system. 

Michael  Melliar-Smith  and  Louise  Moser  have  investigated  four  strategies  for  addressing  the  Heisenbug 
problem: 


13 


•  Acceptance/checkpoint  strategy 

•  Transactional  strategy 

•  Strategy  employing  consistent  diversity  in  time 

•  Strategy  employing  consistent  diversity  in  space 

The  most  appropriate  strategy  to  employ  depends  on  the  nature  of  the  application  and  the  environment  in 
which  it  runs. 

4.3  Methodology  and  Protocols  for  Supporting  Security 

Intrusion  Detection  for  Mobile  Ad  Hoc  Networks  (Levitt).  Our  research  in  this  area  comprises  a 
specification-based  IDS  for  the  AODV  protocol,  a  specification-based  IDS  for  the  OLSR  protocol,  an 
intrusion  detection  architecture  for  MANETs,  and  a  distributed  event-driven  message  exchange  model  for 
MANET  IDS.  We  discuss  each  of  these  topics  in  more  detail  below. 

•  Specification-based  IDS  for  the  AODV  Protocol.  The  Ad-Hoc  On-Demand  Distance  Vector 
(AODV)  Routing  protocol,  designed  for  mobile  ad  hoc  networks,  offers  quick  adaptation  to 
dynamic  link  conditions,  low  processing  and  memory  overhead,  and  low  network  utilization. 
However,  because  the  protocol  design  did  not  address  security  issues,  AODV  is  vulnerable  to 
various  kinds  of  attacks.  In  our  research,  we  analyzed  some  of  the  vulnerabilities,  specifically 
addressing  attacks  against  AODV  that  manipulate  the  routing  messages,  and  have  proposed  a 
solution  based  on  the  specification-based  intrusion  detection  technique  to  detect  attacks  on  AODV. 
Briefly,  our  approach  involves  the  use  of  finite  state  machines  for  specifying  correct  AODV 
routing  behavior,  and  distributed  network  monitors  for  detecting  mn-time  violations  of  the 
specifications.  We  propose  one  additional  field  in  the  protocol  message  to  enable  the  monitoring. 
Our  algorithm  can  effectively  detect  most  of  the  serious  attacks  in  real  time  and  with  minimum 
overhead. 

•  Specification-based  IDS  for  the  OLSR  Protocol.  The  Optimal  Link  State  Routing  (OLSR) 
protocol,  developed  at  INRIA,  France,  is  another  popular  routing  protocol.  Unlike  AODV,  this 
protocol  is  a  proactive  table-driven  link-state  protocol  that  has  been  modified  for  the  MANET 
environment.  Local  Hello  beacons  and  a  multi-point  relay  system  are  employed  to  obtain  local 
state  information  at  a  node  and  disseminate  it  optimally  around  the  network.  In  our  research,  we 
conducted  an  extensive  analysis  of  the  vulnerabilities  in  the  OLSR  routing  protocol,  and  develop 
an  OLSR  IDS  model.  We  established  a  set  of  security  requirements  that  must  be  satisfied  in 
protocol  behavior.  We  established  a  set  of  detection  rules  that  strictly  enforce  the  identified 
security  requirements  during  protocol  execution.  We  use  GloMoSim,  a  scalable  simulation 
environment  for  ad  hoc  networks,  to  implement  the  detection  rules  and  example  attacks,  and  to 
identify  false  alarms  and  exceptions.  This  simulation  is  coupled  with  ongoing  formal  reasoning  and 
proof  methodology  to  establish  that  the  specifications  satisfy  the  security  policy  under  various 
assumptions. 

•  Intrusion  Detection  Architecture  for  MANETs.  Intrusion  detection  in  MANETs  is  challenging 
because  these  networks  change  their  topologies  dynamically,  lack  concentration  points  where 
aggregated  traffic  can  be  analyzed,  utilize  infrastructure  protocols  that  are  susceptible  to 
manipulation,  and  rely  on  noisy,  intermittent  wireless  communication.  We  have  developed  a 
cooperative,  distributed  intrusion  detection  architecture  that  addresses  these  challenges  while 
facilitating  accurate  detection  of  MANET-specific  and  conventional  attacks.  The  architecture  is 
organized  as  a  dynamic  hierarchy  in  which  detection  data  is  acquired  at  the  leaves  and  is 
incrementally  aggregated,  reduced,  and  analyzed  as  it  flows  upward  toward  the  root.  Security 
management  directives  flow  downward  from  the  nodes  at  the  top.  To  maintain  communication 
efficiency,  the  hierarchy  is  automatically  reconfigured  as  needed  using  clustering  techniques  in 
which  clusterheads  are  selected  based  on  topology  and  other  criteria.  The  utility  of  the  architecture 
is  illustrated  via  multiple  attack  scenarios. 

•  Distributed  Event-driven  Message  Exchange  Model  for  MANET  IDS.  Numerous  distributed 
intrusion  detection  systems  have  been  proposed  to  detect  abnormal  routing  behaviors  in  MANET. 
Most  of  these  systems  simply  assume  that  each  detector  has  complete  information  from  peers  instead 
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of  building  an  efficient  message  exchange  framework  among  detectors.  To  solve  this  problem,  we 
propose  a  Distributed  Event-driven  Message  Exchange  Model  (DEMEM)  which  provides  sufficient 
data  for  intrusion  detection  in  MANET.  In  DEMEM,  detectors  residing  in  each  mobile  node 
intercept  routing  messages  of  the  node  and  exchange  information  with  other  detectors  only  when 
they  have  new  routing  information.  This  event-driven  approach  can  avoid  both  costly  promiscuous 
monitoring  and  modifying  routing  protocols.  We  demonstrated  DEMEM  on  a  specification-based 
intrusion  detection  model  that  precisely  detects  insider  routing  attacks  in  Optimized  Link  State 
Routing,  while  only  requiring  2-hop  neighbors  to  exchange  their  Hello  messages.  DEMEM  has 
additional  ID  Request  messages  to  eliminate  false  positives  caused  by  lost  messages.  We  validated 
our  approach  with  formal  analyses  showing  that  DEMEM  ensures  sufficient  information  is  provided 
for  each  detector  in  a  distributed  OLSR  IDS.  In  addition,  we  conducted  simulation  experiments  in 
GloMoSim  demonstrating  low  message  overhead,  no  false  negatives  and  very  low  false  positives. 

Formal  Reasoning  and  Verification  (Levitt)  .  In  this  research  area,  we  have  peformed  formal  reasoning 
and  verification  about  intrusion  detection  systems,  and  also  about  a  specification-based  intrusion  detection 
for  dynamic  auto-configuration  protocols  in  ad  hoc  networks. 

•  A  Specification-Based  Intrusion  Detection  for  Dynamic  Auto-Configuration  Protocols  in  Ad 
Hoc  Networks.  In  this  research  topic,  we  have  addressed  security  issues  of  auto-configuration 
protocols  in  ad  hoc  networks.  Auto-configuration  protocols  enable  nodes  to  obtain  configuration 
information  so  that  they  can  communicate  with  other  nodes  in  the  network.  We  have  described  a 
formal  approach  to  modeling  and  reasoning  about  auto-configuration  protocols  to  support  the 
detection  of  malicious  insider  nodes.  Our  approach  defines  a  global  security  requirement  for  the 
subnet  with  respect  to  the  protocol  and  involves  the  analysis  of  the  local  detection  rules  that 
characterize  the  good  behavior  of  individual  nodes.  The  local  detection  rules  define  a  distributed 
specification-based  intrusion  detection  system  to  detect  malicious  insider  nodes.  Novel  to  the  field  of 
intrusion  detection  is  a  formal  proof  that  the  local  detection  rules  ensure  the  global  security 
requirement.  Our  proof,  in  principle,  can  detect  any  attack,  even  unknown  attacks,  that  can  imperil 
the  global  security  requirement.  Our  proof  rests  on  assumptions  that  reflect  security-imperiling 
behavior  that  they  rule  out;  the  assumptions,  which  can  be  strengthened  or  relaxed  as  the  threat 
changes,  simplify  the  intrusion  detection  system  to  attacks  that  are  unlikely. 
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•  Intrusion  Detection  Systems.  We  have  developed  a  formal  framework  for  the  analysis  of  intrusion 
detection  systems  (IDS)  that  employ  declarative  rules  for  attack  recognition,  e.g.,  specification-based 
intrusion  detection.  Our  approach  allows  reasoning  about  the  effectiveness  of  an  IDS.  A  formal 
framework  is  built  with  the  theorem  prover  ACL2  to  analyze  and  improve  detection  rules  of  IDSs. 
System  Health  and  Intrusion  Monitoring  (SHIM)  is  used  as  an  example  specification-based  IDS  to 
validate  our  approach.  We  formalized  specifications  of  a  host-based  IDS  in  SHIM  which,  together 
with  a  trusted  file  policy,  enables  us  to  reason  about  the  soundness  and  completeness  of  the 
specifications  by  proving  that  the  specifications  satisfy  the  policy  under  various  assumptions.  These 
assumptions  are  properties  of  the  system  that  are  not  checked  by  the  IDS.  Analysis  of  these 
assumptions  shows  the  beneficial  role  of  SHIM  in  improving  the  security  of  the  system.  The  formal 
framework  (see  the  figure)  and  analysis  methodology  provide  a  scientific  basis  for  arguing  that  an 
IDS  can  detect  known  and  unknown  attacks  because  the  IDS  detects  all  attacks  that  violate  a  policy. 
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Automated  Response  (Levitt).  Our  research  in  this  area  comprises  a  network-based  response  framework 
and  an  automatic  response  model  for  MANETs.  These  topics  are  summarized  below. 

•  Network-Based  Response  Framework.  Distributed  Denial  of  Service  (DDoS)  attacks  exploit  the 
availability  of  servers  and  routers,  resulting  in  the  severe  loss  of  connectivity.  We  have  developed  a 
distributed,  automated  response  model  that  utilizes  a  Proportional-Integral-Derivative  controller  to 
aid  in  handling  traffic  flow  management.  This  model  is  designed  to  prevent  incoming  traffic  from 
exceeding  a  given  threshold,  while  allowing  as  much  incoming,  legitimate  traffic  as  possible.  In 
addition,  this  model  focuses  on  requiring  less  demanding  modifications  to  external  routers  and 
networks  than  other  published  distributed  response  models  that  impact  the  effect  of  DDoS  attacks. 

•  Automatic  Response  Model  for  MANETs.  We  have  addressed  the  problem  of  automatic  response 
to  attacks  on  MANETs,  which  is  a  challenging  problem  because  the  topologies  of  these  networks  are 
dynamic  and  the  fully  distributed  cooperation  among  nodes  makes  not  only  the  false  positives  of 
intrusion  alarms  high  but  also  the  response  analysis  complicated.  We  have  developed  a  cooperative, 
distributed  and  automatic  response  model  that  addresses  these  challenges  by  correlating  intrusion 
alarms  and  then  deciding  which  response  among  a  set  of  feasible  responses  is  the  best  one  to  take. 
Our  work  focuses  on  automatic  response  to  “atomic”  routing  message  attacks  of  insiders,  based  on 
an  attack  taxonomy  that  we  developed  to  provide  structure  to  the  ways  an  insider  can  deleteriously 
impact  routing.  Four  basic  response  types  can  be  effected  in  our  system:  isolation,  relocation, 
encryption  and  routing  recovery.  The  decision  on  which  response  to  take  is  based  on  the  criticality  of 
the  node’s  position  in  the  topology  and  on  the  attack.  To  determine  the  importance  of  a  node’s 
position  in  a  network,  we  have  proposed  a  new  concept,  topology  dependency,  to  represent  the 
positional  relationship  between  each  pair  of  neighboring  nodes.  The  utility  of  this  general  response 
model  is  illustrated  via  numerous  atomic  attack  scenarios. 

Worm  Defense  (Levitt).  There  is  a  great  need  for  a  way  to  reproduce  faithfully  live  environments  for 
worm-  and  worm-defense  research.  In  our  research,  we  have  made  use  of  a  network  testbed  called 
EMULAB,  to  satisfy  this  need  and  evaluate  our  framework  for  worm-defense  evaluation  within  enterprise 
networks,  i.e.,  networks  with  a  few  tens  to  a  few  thousands  of  nodes.  We  have  described  an  implementation 
of  the  framework  and  use  it  to  evaluate  an  example  defense  strategy,  but  emphasize  that  the  framework  can 
support  many  different  defense  strategies.  The  framework  is  encapsulated  in  an  API.  This  API  accepts  a 
topology  description  and  a  description  of  the  defense  system,  and  evaluates  the  defense  system  against 
worms.  The  worms  can  be  characterized  by  a  specification  or  operationally  by  a  worm  program. 

Unreliable  Byzantine  Fault  Detectors  for  Solving  Consensus  (Moser  and  Melliar-Smith).  Unreliable 
fault  detectors  can  be  defined  in  terms  of  completeness  and  accuracy  properties,  and  can  be  used  to  solve 
the  consensus  problem  in  asynchronous  distributed  systems  that  are  subject  to  crash  faults. 

Kim  Kihlstrom,  Louise  Moser  and  Michael  Melliar-Smith  have  extended  this  result  to  asynchronous 
distributed  systems  that  are  subject  to  Byzantine  faults.  First,  we  defined  and  categorized  Byzantine  faults. 
We  then  defined  two  new  completeness  properties,  eventual  strong  completeness  and  eventual  weak 
completeness.  We  used  these  completeness  properties  and  previously  defined  accuracy  properties  to  define 
four  new  classes  of  unreliable  Byzantine  fault  detectors.  Then,  we  developed  an  algorithm  that  uses  a 
Byzantine  fault  detector  to  solve  the  consensus  problem  in  an  asynchronous  distributed  system  of  n 
processes  in  which  the  number  k  of  Byzantine  faults  satisfies  k  less  than  or  equal  to  (n- 1  )/3 .  We  also 
developed  algorithms  that  implement  a  Byzantine  fault  detector  in  a  model  of  partial  synchrony.  We  proved 
the  correctness  of  the  consensus  algorithm  and  analyzed  its  complexity. 

Checkpointing  and  Logging  for  Intrusion  Analysis  and  Recovery  (Moser  and  Melliar-Smith). 

Intrusions  often  occur  because  of  operating  system  and  application  program  vulnerabilities.  It  is  important 
to  analyze  thoroughly  how  an  intrusion  occurred  so  that  the  related  vulnerabilities  can  be  found  and  fixed 
quickly.  Unfortunately,  audit  records  produced  through  traditional  system  logging  do  not  contain  enough 
information  to  provide  certainty  on  how  the  system  was  compromised  and  what  damage  was  done.  Such 
logs  are  even  less  useful  to  recover  from  an  intrusion. 

Recently,  researchers  have  started  to  employ  techniques  that  have  been  used  in  the  fault  tolerance 
community,  such  as  logging  of  external  inputs  and  non-deterministic  events  in  order  to  replay  the  execution 
of  the  software,  to  facilitate  better  intrusion  analysis.  Wenbing  Zhao,  Louise  Moser  and  Michael  Melliar- 
Smith  have  designed  a  software  infrastructure  for  intrusion  analysis  and  recovery.  In  this  infrastructure, 
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intrusion  analysis  is  facilitated  by  deterministic  logging  and  replay.  Recovery  from  intrusion  is  carried  out 
by  rolling  back  to  the  last  checkpoint  before  the  intrusion  occurred. 

The  AODVSTAT  Intrusion  Detection  Framework  (Belding-Royer).  Because  a  complete  security 
system  needs  both  intrusion  prevention  and  intrusion  detection  techniques,  we  have  developed  an  intrusion 
detection  framework,  called  AODVSTAT,  that  provides  intrusion  detection  to  ad  hoc  networks  running  the 
AODV  routing  protocol.  AODVSTAT  can  detect  intrusions  both  locally  and  globally  within  the  ad  hoc 
network.  Experiments  have  shown  that  our  solution  provides  effective  intrusion  detection  functionality 
using  a  limited  amount  of  resources.  Our  documented  version  of  AODVSTAT  is  now  freely  available  for 
download  on  the  Internet. 

As  another  part  of  this  work,  we  developed  a  classification  model  for  attacks  against  ad  hoc  networks.  We 
use  a  three-dimensional  model  in  the  form  of  a  cuboid  to  classify  attacks  in  an  ad  hoc  network.  Across  the 
x-axis,  we  group  attacks  based  on  the  type  of  threat  they  pose  to  the  network.  Along  the  y-axis,  each  attack 
is  classified  based  on  the  layer  at  which  the  attack  can  be  launched.  Along  the  z-axis,  attacks  are  classified 
according  to  the  behavior  of  the  nodes  and  the  type  of  the  attack.  The  primary  behaviors  demonstrated  by 
attackers  are  silence,  selfishness,  misbehavior,  mischief  and  interference.  The  classification  scheme  for 
attacks  that  we  have  developed  is  shown  in  the  figure  below. 


The  classification  scheme  for  attacks  that  we  developed. 

The  AODVSTAT  Emulation  System  (Belding-Royer).  To  expand  the  capabilities  of  AODVSTAT  and 
to  test  it  in  larger  systems,  we  have  developed  an  emulation  system  for  AODVSTAT  integrated  with  the 
NS-2  network  simulator.  IDS  sensors  are  placed  within  the  simulated  network.  When  they  receive 
messages,  these  messages  are  captured  and  then  recreated  as  actual  Ethernet  frames  so  that  they  can  be  sent 
to  the  AODVSTAT  module  for  processing.  The  emulation  system  allows  us  to  use  the  actual  AODVSTAT 
code  in  the  evaluation  of  large,  diverse  networks.  The  emulation  system  enables  use  to  investigate  the 
performance  of  large  networks,  the  size  of  which  is  only  limited  by  the  processing  power  of  the  machines 
running  the  simulations.  To  date,  we  have  studied  networks  as  large  as  50  nodes. 
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The  Authenticated  Routing  for  Ad  hoc  Networks  (ARAN)  Security  Protocol  (Belding-Royer).  The 

ARAN  protocol  detects  and  protects  against  malicious  actions  by  third  parties  and  peers  in  managed-open 
ad  hoc  environments.  ARAN  introduces  authentication,  message  integrity  and  non-repudiation  to  an  ad  hoc 
environment  as  a  part  of  a  minimal  security  policy.  Our  evaluations  show  ARAN  has  minimal 

•  Performance  costs  for  the  increased  security  in  terms  of  processing 

•  Networking  overhead. 

As  another  part  of  this  work,  we  defined  a  set  of  three  discrete  ad  hoc  wireless  environments  based  the 
opportunity  for  pre-deployment  exchange  of  security  parameters.  These  environments  are  termed  open, 
managed-open  and  managed-hostile.  By  clearly  defining  the  level  of  security  needed  in  a  particular 
scenario,  the  appropriate  security  mechanisms  can  be  deployed  to  protect  the  network  while  not  needlessly 
burdening  it  with  security  procedures. 

The  figure  below  shows  a  graph  of  the  average  packet  delivery  fraction  of  ARAN  and  AODV.  The  graph 
indicates  that,  even  though  ARAN  has  more  overhead  due  to  the  security  mechanisms,  it  is  still  able  to 
successfully  deliver  the  same  percentage  of  packets  as  AODV. 
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The  average  packet  delivery  fraction  of  ARAN  and  AODV. 

The  next  figure  shows  a  graph  of  the  fraction  of  data  packets  sent  through  malicious  nodes  with  ARAN  and 
AODV.  It  indicates  that,  when  there  are  malicious  nodes  in  the  network,  those  nodes  are  able  to  influence 
the  routes  AODV  selects  such  that  a  larger  percentage  of  packets  traverse  those  nodes.  Malicious  nodes  can 
snoop,  modify  and  drop  packets,  as  well  as  perform  any  of  a  number  of  other  undesirable  events.  However, 
with  ARAN,  malicious  nodes  have  no  way  of  influencing  the  paths  that  are  selected,  and  so  only  if  they  are 
on  the  shortest  path  between  the  source  and  destination  will  they  end  up  on  the  selected  path.  Hence,  far 
fewer  packets  end  up  traversing  malicious  nodes. 


The  fraction  of  data  packets  sent  through  malicious  nodes  with  ARAN  and  AODV. 


•  AODV  (20  nodes) 
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4.4  Protocols  for  Supporting  Scalable  Real-Time  Multicasts 

QoS-Driven  Multicast  Routing  (Hou):  We  have  investigated  the  problem  of  QoS  provisioning  in  shared 
multicast  routing  protocols  such  as  Core-Based  Tree,  Simple  Multicast  and  Protocol  Independent  Multicast 
Sparse  Mode,  and  devised  a  unified  QoS  extension  framework  for  these  kinds  of  multicast  routing 
protocols.  Specifically,  the  QoS  requirements  are  heterogeneous  among  receivers,  and  are  specified  as 
either  (Rl)  an  upper/lower  bound  of  an  additive,  multiplicative,  or  concave  QoS  parameter  (e.g.,  the  end-to- 
end  delay  bound),  or  (R2)  an  upper  bound  of  the  inter-receiver  difference  of  a  QoS  parameter  (e.g.,  the 
inter-receiver  delay  jitter  bound).  Considering  these  QoS  requirements,  we  have  devised  a  set  of 
enhancements  to  the  member  join/leave  and  state  update/refresh  procedures  to  facilitate  the  deployment  of 
additive  (e.g.,  end-to-end  delay  bound),  multiplicative  (e.g.,  packet  loss  ratio  along  a  path)  and  concave 
(e.g.,  minimum  bandwidth  available)  QoS.  In  particular,  we  have  (i)  devised  eligibility  tests  to  check 
whether  a  new  member  can  join  a  multicast  tree  at  adequate  QoS,  without  violating  the  existing  QoS  to  the 
other  members  on  the  tree,  (ii)  determined  the  set  of  states  kept  at  each  router  in  order  to  conduct  eligibility 
tests,  (iii)  devised  a  state  update/refresh  procedure  that  is  based  on  soft  state  and  can  be  readily  integrated 
with  the  tree  maintenance  mechanism  that  already  exists  in  most  core-based  multicast  routing  protocols,  and 
(iv)  implemented,  based  on  the  CBT  daemon  in  FreeBSD  (developed  by  British  Telecom),  and  empirically 
evaluated  the  performance  of  the  QoS  framework.  We  have  reported  our  evaluation  results  in  the  Inter- 
Domain  Multicast  Routing  Working  Group  of  the  Internet  Engineering  Task  Force. 

Scalable,  real-time  video  multicast  protocols  (Suda):  In  this  research,  we  have  investigated  a  novel 
protocol,  called  Source-Adaptive  Multi-layered  Multicast  (SAMM),  to  control  congestion  caused  by  real¬ 
time  video  multicast  over  the  Internet.  SAMM  relies  on  the  exchange  of  feedback  between  end  systems  (a 
video  source  and  multicast  destinations)  and  rate  adjustment  at  a  video  source.  In  SAMM,  multi-layered 
video  encoding  is  deployed,  and  raw  video  data  is  encoded  into  one  or  more  layers  of  differing  priority. 
Video  sources  adapt  the  number  of  video  layers  they  generate  as  well  as  the  transmission  rate  of  each  video 
layer  in  response  to  congestion  feedback  from  the  network  and  receivers.  Using  simulations  that  incorporate 
multi-layered  video  codecs,  we  have  demonstrated  that  SAMM  exhibits  better  scalability  and 
responsiveness  to  congestion  than  existing  protocols.  Accomplishments  in  this  work  include  design  of  the 
SAMM  protocols,  a  simulator  to  evaluate  the  proposed  protocols  and  simulation  results  confirming  the 
performance  of  the  proposed  protocols. 

Multicast  Group  Communication  Engine  and  Bridge  (Moser  and  Melliar-Smith):  Multicast  group 
communication  protocols  are  needed  for  fault-tolerant  distributed  systems  to  maintain  strong  replica 
consistency.  However,  different  multicast  protocols  are  appropriate  for  different  applications  and 
environments.  As  part  of  this  MURI  project,  we  have  defined  a  multicast  group  communication  engine  and 
bridge  that  allows  use  of  multiple  group  communication  protocols  concurrently.  The  group  communication 
engine  uses  Lamport  timestamps  for  message  ordering  and  heartbeat  (null)  messages  for  liveness. 
Timestamps  and  heartbeat  messages  provide  a  convenient  mechanism  by  which  multiple  group 
communication  protocols  can  coexist  without  constraining  the  on-the-wire  message  formats  and  internal 
algorithms  of  the  protocols 

The  group  communication  engine  places  timestamps  on  messages,  and  multicasts  messages  to  groups,  using 
one  or  more  group  communication  protocols.  Each  multicast  group  communication  protocol  reliably 
delivers  its  timestamped  messages  in  timestamp  order  to  the  group  communication  engine.  The  group 
communication  engine  integrates  the  streams  of  messages  into  a  single  stream  of  messages  for  delivery  in 
timestamp  order.  No  forwarding  or  conversion  from  one  multicast  protocol  to  another  is  necessary,  and  the 
multicast  protocols  never  communicate  directly  with  one  another  or  even  know  of  each  others'  existence. 

Real-Time  Fault  Tolerance  Protocol  (Melliar-Smith  and  Moser):  Wenbing  Zhao,  Michael  Melliar- 
Smith  and  Louise  Moser  have  developed  a  Real-Time  Fault  Tolerance  Protocol  that  provides  fault  tolerance 
for  distributed  applications  using  the  primary/backup  process  replication  approach.  The  protocol  suite 
consists  of  a  reliable  messaging  protocol,  a  membership  protocol,  and  a  virtual  determinizer.  The  messaging 
protocol  provides  a  reliable  ordered  message  delivery  service  by  employing  a  direct  group-to-group 
multicast,  where  the  ordering  is  determined  by  the  primary  replica  in  each  process  group.  The  membership 
protocol  provides  a  fast  reconfiguration  and  recovery  service  when  one  or  more  replicas  become  faulty  and 
when  a  replica  joins  or  leaves  a  process  group.  The  virtual  determinizer  captures  the  ordering  information  at 
the  primary  and  enforces  the  same  ordering  at  the  backups  for  several  major  sources  of  replica  non¬ 
determinism. 
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4.5  Protocols  for  Supporting  Mobile  Wireless  Environments 

Mitigating  Misbehavior  in  Ad  Hoc  Routing  Protocols  (Baker):  Gaining  reasonable  traffic  throughput  in 
ad  hoc  networks  requires  the  nodes  in  the  network  to  participate  in  packet  forwarding  on  behalf  of  other 
nodes.  The  presence  of  nodes  that  misbehave  in  this  respect  degrades  network  performance.  Misbehaving 
nodes  can  assume  at  least  two  forms:  nodes  that  agree  to  forward  packets  on  behalf  of  others  but  then  fail  to 
do  so,  or  merely  selfish  nodes  that  attempt  to  preserve  their  own  resources  by  not  agreeing  to  forward 
packets. 

We  have  developed  two  general  types  of  routing  enhancements  to  handle  such  misbehaviors.  The  first  set 
of  techniques,  called  Watchdog  and  Pathrater,  allows  us  to  isolate  and  route  around  nodes  that  agree  to 
participate  but  fail  to  do  so.  The  second  technique,  called  OCEAN  (Observation-based  Cooperation 
Enforcement  in  Ad  hoc  Networks),  attempts  to  limit  both  forms  of  misbehavior  through  punishing  bad  nodes 
by  denying  their  traffic. 

Our  approach  in  both  avoiding  misbehaving  nodes  and  in  providing  participation  incentives  is  to  avoid 
modifying  existing  routing  protocols,  which  would  be  error-prone  and  complex.  Instead,  we  provide 
information  to  an  existing  routing  protocol  to  allow  it  to  make  a  better  choice  than  the  default  route  choice. 

A  second  goal,  especially  with  the  incentive -based  techniques,  is  that  the  protocol  enhancements  be 
lightweight.  This  means  that  in  forming  opinions  about  the  behavior  of  other  nodes,  we  rely  only  on 
directly  observed  events,  and  not  on  reputation  information  as  reported  by  other  parties.  Such  direct 
observation  allows  us  to  avoid  the  overhead  of  propagating  reputation  information  throughout  the  network. 
In  addition,  this  means  that  unlike  Blazevic’s  work  we  do  not  need  a  tamper-resistant  layer  on  each  node, 
and  unlike  Buchegger’s  work  we  do  not  need  signing  or  other  encryption. 

We  simulated  Watchdog  and  Pathrater  in  the  Dynamic  Source  Routing  protocol  (DSR)  implementation  in 
the  ns-2  simulator.  The  Watchdog  and  Pathrater  algorithms  when  used  together  in  an  ad  hoc  network  of 
moderate  mobility  running  the  Dynamic  Source  Routing  protocol  (DSR)  improve  packet  throughput  by 
17%  in  the  presence  of  even  40%  misbehaving  nodes  while  increasing  routing  protocol  overhead  by  9% 
(fromDSR's  12%  to  17%). 

Our  incentive-based  techniques  are  still  under  development.  So  far  we  have  evaluated  these  techniques 
within  a  high-level  Java  simulation  environment  specifically  designed  for  testing  such  protocols  quickly. 
This  simulator  does  not  give  us  accurate  timing  results,  since  it  has  no  model  for  the  underlying  wireless 
network,  but  it  allows  us  to  measure  overall  throughput  during  the  course  of  the  simulation.  Within  these 
simulations,  we  find  that  OCEAN  improves  the  traffic  flows  of  well-behaved  nodes  by  up  to  15%  while 
dramatically  lowering  the  throughput  of  misbehaving  nodes  (by  up  to  75%). 

Improving  Ad  Hoc  Network  Performance  through  New  Packet  Scheduling  Algorithms  (Baker):  We 

have  investigated  improving  the  overall  performance  of  ad  hoc  networks  through  new  packet  scheduling. 
For  this  study,  we  examined  the  queuing  dynamics  of  nodes  in  an  ad  hoc  network  across  a  wide  mobility 
spectrum  (from  static  to  non-stop  motion)  in  both  on-demand  (DSR)  and  pro-active  protocols  (Greedy 
Perimeter  Stateless  Routing,  or  GPSR  [Karp])  as  implemented  in  the  ns-2  simulator.  For  this  packet 
scheduling  work,  we  again  do  not  modify  the  routing  protocols  themselves  but  instead  modify  only  the 
processing  of  packet  queues  at  a  node.  We  find  that  the  common  scheduling  practice  of  giving  priority  to 
routing  control  packets  over  data  packets  is  an  advantage  in  on-demand  routing  protocols  such  as  DSR,  but 
that  it  can  actually  reduce  performance  in  proactive  routing  protocols.  Most  ad  hoc  network  packet 
schedulers  do  not  distinguish  between  different  types  of  data  packets,  but  we  find  that  it  is  useful  to  do  so. 
In  particular,  giving  priority  to  data  packets  with  short  distance  metrics  (fewest  remaining  hops  in  DSR  and 
shortest  remaining  distance  in  GPSR)  shows  the  smallest  delay  -  about  a  30%  reduction  over  standard  DSR 
and  standard  GPSR  and  the  highest  throughput  without  increasing  routing  overhead. 

Wireless  Networks  (Garcia-Luna-Aceves):  Garcia-Luna-Aceves  and  his  students  have  designed  new 
algorithms  and  protocols  for  channel  access,  routing,  topology  control  and  broadcasting;  verified  the 
correctness  of  these  algorithms  and  protocols;  and  analyzed  their  performance  through  analytical  models 
and  simulations.  The  simulations  were  carried  out  in  GloMoSim,  Qualnet  and  ns2,  which  are  used  widely  in 
the  research  community. 
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We  have  developed  analytical  models  for  collision  avoidance  protocols  operating  in  multihop  ad  hoc 
networks  with  omni-directional  or  directional.  These  models  were  extensions  of  prior  work  on  CSMA  and 
ALOHA  and  the  first  to  address  collision  avoidance  schemes  in  multihop  wireless  networks. 

We  have  developed  new  channel  access  protocols  based  on  node  and  link  activation  and  applied  them  to 
sensor  networks.  We  have  started  the  development  of  flow-oriented  scheduled  channel  access,  which 
extends  our  prior  results  on  scheduled  channel  access  by  using  the  flows  traversing  network  nodes  the 
entities  that  compete  for  scheduled  channel  access. 

In  contrast  to  a  wired  network,  the  scheduling  decisions  made  at  the  MAC  layer  in  wireless  networks  impact 
the  de-facto  topology  over  which  routing  and  multicasting  must  operate.  Hence,  it  is  important  to 
understand  the  ability  to  manage  the  useful  topology  of  a  wireless  network  to  make  network-level  and  end- 
to-end  protocols  more  effective  in  multihop  wireless  networks.  We  have  developed  a  novel  approach  to  the 
solution  of  the  connected  dominating  set  election  problem,  which  we  call  Topology  Management  by 
Priority  Ordering  (TMPO).  Our  approach  uses  the  neighbor-aware  contention  resolution  (NCR)  algorithm 
employed  in  our  prior  collision-free  channel  access  protocols  to  provide  fast  convergence  and  load¬ 
balancing  with  regard  to  the  battery  life  and  mobility  of  mobile  nodes.  Based  on  NCR,  TMPO  assigns 
randomized  priorities  to  mobile  stations,  and  elects  a  minimal  dominating  set  (MDS)  and  the  connected 
dominating  set  (CDS)  of  an  ad  hoc  network  according  to  these  priorities.  In  doing  so,  TMPO  requires  only 
two-hop  neighbor  information  for  the  MDS  elections.  The  dynamic  priorities  assigned  to  nodes  are  derived 
from  the  node  identifiers  and  their  willingness  to  participate  in  the  backbone  formations.  The  willingness  of 
a  node  is  a  function  of  the  mobility  and  battery  life  of  the  node.  The  integrated  consideration  of  mobility, 
battery  life  and  deterministic  node  priorities  makes  TMPO  the  best-performing  heuristic  for  topology 
management  in  ad  hoc  networks  to  date. 

We  have  developed  a  mesh-oriented  multicast  protocol  that  eliminates  the  limitations  of  tree-based 
multicasting,  ODMRP,  and  CAMP.  Our  protocol,  which  we  called  PUMA  is  the  first  multicast  routing 
protocol  for  ad  hoc  networks  based  on  meshes  and  a  receiver-initiated  group  joining  scheme  that  does  not 
require  an  underlying  unicast  routing  protocol  to  operate  or  the  pre-assignment  of  cores  to  groups.  In  a 
nutshell,  the  first  receiver  joining  a  group  becomes  the  core  of  the  group  and  starts  transmitting  core 
announcements  periodically.  Each  such  packet  specifies  a  sequence  number,  the  address  of  the  group,  the 
address  of  the  core,  the  sending  node,  and  the  distance  to  the  core.  Routers  use  the  best  core 
announcements  they  receive  to  send  their  own  core  announcements  to  their  neighbors,  and  over  time  each 
router  has  one  or  multiple  paths  to  the  core  of  each  known  group  in  the  ad  hoc  network.  To  join  a  multicast 
group,  a  router  sends  a  join  announcement  to  its  next-hop  toward  the  core  of  the  group,  which  it  leams  from 
core  announcements.  Nodes  receiving  join  announcements  intended  for  them  join  the  group  and  also  send  a 
join  announcement  periodically  to  their  next-hops  for  the  group  core.  To  attain  a  mesh  structure,  routers  that 
overhear  join  announcement  packets  not  addressed  to  them  from  two  or  more  neighbors  join  the  group 
silently,  i.e.,  they  do  not  send  join  announcements.  Normal  group  members  send  join  announcements 
periodically.  A  multicast  data  packet  for  a  group  is  forwarded  from  its  source  towards  the  core  of  the 
group,  using  next-hop  information  obtained  in  core  announcements,  and  is  flooded  within  the  mesh  of  the 
group  as  soon  as  it  reaches  the  first  mesh  member.  Members  and  silent  members  participate  in  the  flooding 
of  data  packets  within  a  mesh.  We  have  shown  through  extensive  simulations  in  QualNet  that  PUMA 
dramatically  outperforms  ODMRP  and  MAODV. 

We  have  developed  several  techniques  to  maintain  loop-free  on-demand  routing  in  ad  hoc  networks, 
regardless  of  the  changes  in  the  underlying  network  topology.  These  schemes  address  the  use  of  path 
information  as  labels  used  for  ordering  of  nodes  with  respect  to  a  destination,  the  use  of  sequence  numbers 
as  ordering  labels,  the  use  of  path  information  together  with  information  about  links  that  caused  routes  to 
change,  and  the  combination  of  sequence  numbers  with  limited  inter-neighbor  coordination.  All  the 
proposed  schemes  work  on  the  basis  of  route  requests  (RREQ),  route  replies  (RREP),  and  route  errors 
(RERR)  in  their  messaging  structure,  which  is  similar  to  that  used  in  DSR  and  AODV.  However,  the 
proposed  schemes  use  new  loop-free  invariants  for  each  destination.  All  of  these  schemes  have  been  shown 
to  outperform  AODV  and  DSR  significantly.  Furthermore,  we  have  demonstrated  that  AODV  and  similar 
protocols  based  on  sequence  numbers  can  incur  looping  and  counting  to  infinity  problems  when  nodes 
delete  routing  state  for  certain  destinations  due  to  rebooting  or  other  causes.  AODV  and  the  more  recent 
DYMO  specification  can  incorporate  our  proposed  schemes  to  solve  the  looping  problems  we  uncovered. 
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We  have  investigated  distributed  algorithms  for  pruning  the  number  of  nodes  needed  to  forward  route 
requests  or  other  type  of  broadcast  packets  in  ad  hoc  networks.  We  have  also  developed  a  number  of 
algorithms  to  support  clustering  in  ad  hoc  networks,  such  that  a  given  node  can  be  covered  by  multiple 
cluster  heads,  each  within  a  maximum  number  of  hops,  while  trying  to  minimize  the  number  of  cluster 
heads  needed  to  cover  all  network  nodes. 


Topology  Control  in  Mobile  Ad-hoc  Networks  (Hou):  Topology  control  and  management  (how  to 
determine  the  transmit  power  of  each  node  so  as  to  maintain  network  connectivity  while  consuming  the 
minimum  possible  power)  is  an  important  issue  in  mobile  ad  hoc  networks  (MANETs).  Instead  of 
transmitting  using  the  maximum  possible  power,  nodes  in  a  MANET  collaboratively  determine  their 
transmit  power  and  define  the  topology  of  the  MANET  by  their  neighbor  relations  under  certain  criteria. 

Topology  control  and  management  is  important  in  that  it  critically  affects  the  system  performance  in  a 
cross-layer  manner.  Determination  of  the  transmit  power  level  affects  several  node  attributes:  (i)  the 
transmission  range  (as  well  as  the  interference  caused  by  transmission),  and  (ii)  the  node  degree  (and  hence 
the  scope  of  the  physical  neighborhood).  Consequently,  topology  control  and  management  influences  the 
Physical  layer  through  the  quality  of  the  received  signal,  the  MAC  layer  through  the  interference  and 
contention  caused  by  communication,  the  Network  layer  through  the  set  of  links  that  are  formed  (and  hence 
the  connectivity),  the  Transport  layer  through  the  overall  data  transport  capability  (and  hence  the  network 
capacity)  and,  last  but  not  least,  the  drain  on  battery  energy.  As  a  result,  determining  adequate  transmit 
power  in  a  decentralized  manner  is  critical  to  several  aspects  of  sensor  network  functionality.  It  affects 
network  spatial  reuse  and  hence  network  capacity.  It  affects  network  connectivity  —  choosing  too  large  a 
transmit  power  level  results  in  excessive  interference,  while  choosing  too  small  a  transmit  power  level  can 
result  in  a  disconnected  network.  It  also  has  an  effect  on  the  contention  for  the  medium.  MAC-level 
contention  and  collision  can  be  mitigated  as  much  as  possible  by  choosing  the  smallest  transmit  power 
subject  to  maintaining  network  connectivity.  Finally,  it  affects  the  energy  use  for  communication,  and  thus 
impacts  battery  life,  a  critical  resource  for  many  MANET  applications. 

We  have  devised  a  Local  Minimum  Spanning  Tree  (LMST)  algorithm  and  variations  for  topology  control 
and  management.  In  LMST,  each  node  builds  its  local  minimum  spanning  tree  independently  with  the  use 
of  locally  collected  information,  and  keeps  only  ontree  nodes  that  are  one  hop  away  from  its  neighbors  in 
the  final  topology.  We  have  proved  analytically  that,  if  every  node  executes  LMST,  the  network 
connectivity  is  preserved  (i.e.,  if  the  original  topology,  in  which  each  node  uses  the  maximal  transmit  power 
is  connected,  the  topology  derived  under  LMST  is  also  connected).  An  important  feature  of  LMST  is  that  it 
is  fully  localized  (i.e.,  it  depends  only  on  the  information  collected  locally),  and  hence  is  less  susceptible  to 
the  impact  of  mobility  and  incurs  less  message  overhead/delay.  In  addition,  we  have  proved  that  (1)  the 
node  degree  of  any  node  in  the  resulting  topology  is  bounded  by  6,  and  (2)  the  topology  can  be  transformed 
into  one  with  bi-directional  links  (without  impairing  network  connectivity)  after  removal  of  all  uni¬ 
directional  links  (see  the  first  figure  below).  Simulation  results  indicate  that,  compared  with  existing 
topology  control  algorithms  (e.g.,  cone-based  topology  control  by  Li  et  al.  and  relay-region-based  topology 
control  by  Meng  et  al.),  the  topology  derived  under  LMST  has  smaller  average  node  degrees  (both  logical 
and  physical)  and  smaller  average  link  length  (see  the  second  figure  below).  The  former  reduces  the  MAC- 
level  contention,  while  the  latter  implies  that  small  transmission  power  is  needed  to  maintain  connectivity. 
Moreover,  as  shown  in  the  third  figure  below,  LMST  realizes  a  system  throughput  improvement  of  25-50%. 
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(a)  The  topology  derived  urine  maximum  tran«m$rioii  power. 


(b)  The  topology  derived  under  LMST 


Network  topologies  with  the  use  of  maximum  transmission. 


(a)  Average  degree  (b)  Average  link  length 

Performance  comparisons  w.r.t.  node  degree  and  average  length  of  links  among  different  algorithms,  n  nodes  are 
uniformly  distributed  in  a  1  km  x  1  km  region,  where  n  varies  from  50  to  250. 

As  part  of  the  extension  to  LMST,  we  have  shown  that  most  of  the  topology  control  algorithms  (including 
LMST)  render  sub-optimal  performance  or  even  fail  when  different  nodes  have  different  maximal 
transmission  ranges.  The  major  problem  is  that  the  connection  between  a  pair  of  nodes  might  not  be  bi¬ 
directional  in  a  topology  of  heterogeneous  networks.  This  scenario  is  typical  in  the  battlefield  where 
different  warfare  entities  have  different  capabilities.  To  deal  with  this  problem,  we  have  extended  LMST 
and  devised  a  localized  algorithm  for  heterogeneous  wireless  networks  with  non-uniform  transmission 
ranges.  Essentially,  a  minimum  spamming  graph  (instead  of  a  minimum  spanning  tree)  is  built 
independently  in  the  neighborhood  of  each  node. 

Another  extension  of  LMST  that  we  investigated  is  to  equip  topology-controlled  networks  with  fault 
tolerance  capabilities.  In  a  controlled  topology,  if  a  node  fails  (due  to  power  depletion  and/or  malicious 
destruction)  or  moves  away,  the  network  is  more  susceptible  to  temporary  disconnection  (as  a  result  of 
reduced  routing  redundancy).  Obviously,  there  exists  a  tradeoff  between  route  redundancy  and  other 
performance  aspects  (power  consumption,  spatial  reuse,  MAC  level  interference,  and  network  capacity). 
We  have  extended  LMST  to  preserve  ^-connectivity  and  have  evaluated  it  via  simulation  and  empirical 
studies  with  respect  to  the  decrease  in  overall  system  throughput  caused  by  incorporating  fault  tolerance. 
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Total  throughput  and  energy  efficiency. 


The  total  amount  of  data  delivered  (in  bytes)  and  the  energy  efficiency  (in  bytes/J).  The  n  nodes  are 
randomly  distributed  in  a  1500m  x  200m  region,  n/2  of  which  are  sources  and  the  other  n/2  are  destinations, 
with  CBR  traffic  between  them.  The  two-ray  ground  model  is  used  as  the  propagation  model,  IEEE  802.1 1 
is  used  as  the  MAC  protocol,  and  AODV  is  used  as  the  routing  protocol.  The  start  time  of  a  connection  is 
uniformly  distributed  in  the  range  [25s,  50s]  in  a  simulation  run  of  200  seconds. 

Power  Management  for  Mobile  Ad-Hoc  Environments  (Hou):  We  have  investigated  the  issue  of  power 
management  in  multi-hop  wireless  networks.  The  study  hinges  on  the  observations  that  wireless  devices 
incur  significant  power  consumption  in  idle  states  as  demonstrated  by  other  experimental  studies  and  our 
own  measurement  results.  As  most  modem  wireless  devices  can  be  set  to  different  power  states  to  conserve 
energy,  power  management  (which  turns  devices  to  low-power  states  when  they  are  not  in  use)  is  a 
promising  technique  to  conserve  energy.  The  objective  of  this  study  was  to  devise  and  evaluate  power 
management  schemes  to  reduce  the  energy  consumption  in  idle  periods  while  minimizing  the  performance 
penalty  incurred. 


Performance  trade-off  between  end-to-end  delay  and  energy  efficiency. 


In  this  study,  we  discovered  that  the  effectiveness  of  power  management  is  determined  by  the  degree  of 
coordination  among  communication  entities  and  the  amount  of  cross-layer  information  utilized.  We  have 
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devised  a  suite  of  protocols  that  explore  the  design  space  from  asynchronous  to  synchronous,  reactive  to 
proactive,  and  from  agnostic  to  cross-layer  information  to  utilization  of  hints  from  the  higher  layers.  We 
have  quantified  the  impact  of  the  degree  of  coordination  and  pro-activeness  on  energy-performance 
tradeoffs  through  combinatorial  analysis  and  queuing  models.  Finally,  we  evaluated  the  protocols  using 
both  event-driven  simulations  and  experiments  on  a  lab  motes  testbed.  We  have  designed  the 
communication  stack  and  modular  support  for  power  management  in  TinyOS  and  also  a  suite  of  benchmark 
tests  to  profile  the  energy  consumption  of  wireless  sensor  nodes.  Specifically,  we  have  carried  out  the 
following  research  tasks: 

•  Reactiveness  in  maintaining  and  distributing  state  information  is  an  important  design  principle  in 
multi-hop  wireless  communication.  On  the  one  hand,  distribution  of  state  information  can  be  costly 
in  terms  of  power  consumption  due  to  the  size  of  the  network.  On  the  other  hand,  state  information 
can  become  stale  quickly  because  of  network  dynamics.  We  have  applied  such  design  principles  and 
have  proposed  an  on-demand  power  management  framework  to  address  the  second  problem.  The 
key  idea  is  to  switch  a  node  to  low-power  state  upon  detection  of  prolonged  idleness.  Soft  states  for 
power  management  are  established  and  refreshed  by  data,  and  control  packet  transmissions.  On 
timeout  of  the  soft  state,  a  node  is  switched  to  the  low-power  state.  We  have  conducted  extensive 
simulations  to  study  the  sensitivity  of  the  parameter  settings,  the  impact  of  cross-layer  information 
and  the  performance  of  on-demand  power  management  in  conjunction  with  various  routing 
protocols.  We  have  demonstrated  that  the  on-demand  power  management  can  indeed  effectively  tie 
the  energy  consumption  with  the  communication  in  the  network,  as  only  nodes  on  the 
communication  path  are  kept  active  (see  the  figure  below).  Although  the  reactive  protocols  usually 
suffer  from  long  startup  time  in  establishing  the  initial  set  of  states,  on-demand  power  management 
alleviates  such  problems  by  providing  a  control  parameter  to  adjust  the  delay  incurred  with  respect 
to  traffic  load. 

•  Power  management  in  multi-hop  wireless  networks  differs  from  stand-alone  systems  in  that 
distributed  coordination  is  required,  as  all  the  entities  have  to  be  put  in  the  active  state  for  the 
communication  to  take  place.  An  important  question  is  how  to  coordinate  the  power  management 
states  among  distributed  communication  entities  without  global  synchronization.  Without  prior 
knowledge  of  the  set  of  neighbors  and  in  highly  mobile  environments  where  neighbor  information  is 
volatile,  it  is  desirable  to  have  deterministic  bounds  to  discover  the  neighbors  and  their  power 
management  schedules  while  minimizing  the  energy  consumption  in  idle  periods.  We  have  designed 
an  asynchronous  wakeup  schedule  based  on  the  theory  of  block  designs.  The  idea  is  similar  to 
ensuring  mutual  exclusion  in  distributed  systems  but  with  a  stronger  shift-invariant  property.  The 
theoretical  result  is  applicable  to  both  homogeneous  and  heterogeneous  scenarios  where  nodes  have 
similar  or  different  residual  battery  power.  The  asynchronous  wakeup  mechanism  works  with  on- 
demand  power  management  seamlessly  to  provide  energy  savings  with  a  controllable  impact  on 
performance.  From  the  simulation  studies,  we  observe  40-80%  savings  in  energy  consumption,  while 
keeping  the  average  delay  comparable  to  a  network  without  power  management. 

•  It  has  long  been  recognized  that  energy  conservation  usually  comes  at  the  cost  of  degraded 
performance  such  as  delay  and  throughput  both  in  stand-alone  systems  and  communication 
networks.  We  have  developed  analytical  models  to  quantify  such  a  tradeoff  under  different  power 
management  policies.  Based  on  the  decision  when  to  put  nodes  to  low-power  states,  we  have 
categorized  power  management  policies  into  two  classes,  time-out  driven  and  polling-based.  A 
M/G/l/K  model  with  multiple  vacations  and  attention  spans  is  used  to  characterize  the  timeout 
driven  policies,  while  a  transient  analysis  is  applied  to  derive  state  transition  probabilities  in  polling- 
based  systems.  We  have  shown  numerically  how  different  system  parameters  affect  the  energy- 
performance  tradeoff.  We  found  that  for  time-out  driven  power  management  policies,  the  optimal 
policy  exhibits  a  threshold  structure,  i.e.,  when  the  traffic  load  is  below  a  certain  threshold,  a  node 
should  switch  to  the  low-power  state  whenever  possible  and  remain  active  otherwise.  From  our 
analysis,  contrary  to  popular  beliefs,  polling-based  policies  such  as  the  IEEE  802.11  PSM  are  not 
energy  efficient  for  light  traffic  load. 

•  To  evaluate  empirically  the  performance  of  our  proposed  power  management  schemes,  we  have 
devised  power  management  modules  and  a  communication  stack  in  TinyOS  with  emphasis  on  the 
generality  and  robustness  of  the  communication  primitives  supported.  The  set  of  APIs  can  be  used 


25 


for  various  routing  and  power  management  protocols.  Then  we  implemented  the  on-demand  power 
management  framework  and  the  asynchronous  wakeup  mechanism  on  a  motes  testbed.  We  have  also 
carried  out  a  suite  of  benchmark  tests  to  profile  the  energy  consumption  of  sensor  nodes  in  TinyOS. 

Fundamental  Performance  Limits  of  Wireless  Sensor  Networks  (Hou):  Driven  by  advances  in  MEMS 
micro-sensors,  wireless  networking,  and  embedded  processing,  ad-hoc  networks  of  devices  and  sensors  with 
(limited)  sensing  and  wireless  communication  capabilities  are  becoming  increasingly  available  for 
commercial  and  military  applications  such  as  environmental  monitoring  (e.g.,  traffic,  habitat,  security), 
industrial  sensing,  and  diagnostics  (e.g.,  factory,  appliances),  critical  infrastructure  protection  (e.g.,  power 
grids,  water  distribution,  waste  disposal),  and  situational  awareness  for  battlefield  applications.  Much  has 
been  written  about  how,  once  deployed,  these  wireless  networks  will  affect  the  way  we  monitor 
environments,  track  objects,  fight  wars,  and  recover  from  disasters. 

Interests  in  wireless  sensor  networks  have  indeed  opened  up  new  research  areas  and  have  led  to  a  fairly 
large  number  of  research  activities  in  the  areas  of  protocol  design  and  system  building/prototyping. 
However,  comparatively  little  work  has  been  done  on  understanding  the  fundamental  performance  limits  of 
wireless  sensor  networks,  e.g.,  the  asymptotic  behaviors  of  these  networks  with  respect  to  network  capacity, 
longevity,  scalability,  coverage,  connectivity,  and  critical  power  required  to  maintain  connectivity.  In  this 
study,  we  aimed  to  perform  a  rigorous  study  of  performance  limits  for  wireless  sensor  networks  with  respect 
to  coverage,  connectivity,  lifetime,  and  critical  power  analysis.  We  have  also  designed  the  protocols  that 
aim  to  approach  these  performance  limits.  Specifically,  we  have  carried  out  several  research  tasks  along  the 
following  lines: 

•  Critical  Power,  Node  Degree  and  Other  Node  Attributes  for  Maintaining  Connectivity.  We 

have  investigated  the  critical  power  required  (in  the  almost  surely  sense)  to  maintain  connectivity 
and  how  it  scales  as  the  network  size  or  density  increases.  Once  the  power  levels  used  by  nodes  are 
determined,  their  transmission  range,  neighbor  relation,  and  other  graph  properties  (such  as  clique 
number  and  chromatic  number)  are  also  determined.  Thus,  we  have  studied  the  relationship  between 
these  node/network  attributes  and  connectivity.  In  particular,  we  have  derived  the  critical  node 
degree  needed  to  maintain  network  connectivity,  where  the  critical  node  degree  is  the  smallest 
integer  d  such  that  the  d-nearest-neighbor  graph  over  the  sensor  network  maintains  connectivity. 
Also,  we  have  investigated  the  extent  to  which  topology  control  (i.e.,  the  mechanism  that  allows  each 
node  to  transmit  using  different  power  levels)  conserves  power  (in  the  asymptotic  sense),  and  have 
devised,  based  on  these  theoretical  findings,  localized  topology  control  algorithms. 

•  Optimal  Conditions  for  A-coverage  and  Their  Use  in  Devising  Localized  Algorithms.  After  the 
sensing  threshold  (and  hence  the  sensing  range)  is  set,  the  area  in  which  a  sensor  can  perform  its 
sensing  task  is  determined.  To  prolong  the  network  lifetime,  it  is  desirable  that  a  subset  of  nodes  is 
selected  (on  a  rotational  basis)  to  provide  k- coverage  for  at  least  □ -portion  of  the  monitored  area. 
We  have  explored  optimality  conditions  based  on  which  a  subset  of  working  nodes  can  be  chosen  to 
maintain  ^-coverage.  We  have  devised,  based  on  the  optimality  conditions,  fully  decentralized  and 
localized  algorithms  for  ^-coverage  in  large-scale  sensor  networks. 

•  Network  Capacity  and  Lifetime.  Once  the  power  level  is  set,  and  the  coverage/connectivity 
condition  is  determined,  the  network  capacity  and  lifetime  can  be  analyzed.  We  seek  the  relationship 
between  the  minimum  energy  required  and  the  network  capacity  that  results.  We  believe  the  key  is 
to  model  the  locations  of  nodes,  as  well  as  the  propagation  attenuation,  and  then  to  study  the 
transport  capacity.  We  are  also  looking  into  the  relationship  between  ^-coverage  and  network 
lifetime  and  between  ^-coverage  and  node  density.  Given  the  lifetime  T  of  a  single  wireless  node, 
the  network  lifetime  is  upper  bounded  by  T  times  the  maximum  value  of  k.  With  the  above  derived 
relationships,  we  can  answer  several  fundamental  questions  such  as  (a)  how  many  wireless  nodes 
(the  nodal  density)  have  to  be  deployed  in  a  region,  in  order  to  continuously  monitor  the  region  for  a 
period  of  time  kT,  and  (b)  given  the  number  of  wireless  nodes,  what  is  the  maximal  possible  lifetime 
that  can  be  achieved  by  any  algorithm? 

•  Large-scale  J-Sim  Network  Simulation.  J-Sim  supports  network  emulation  at  the  MAC  level, 
which  allows  us  to  interface  a  set  of  real-life  Berkeley  mica  motes  with  a  large-scale  sensor  network 
(of  desired  size),  seamlessly  transport  data  between  them,  and  carry  out  a  high-fidelity  scalability 
study.  Currently,  we  are  studying  (a)  whether  the  performance  limits  still  hold  when  some  of  the 
assumptions  (e.g.,  the  Poisson  point  assumption,  the  Toroidal  model,  and  the  sensing/transmission 
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disk  assumption)  are  not  valid,  and  (b)  whether,  and  beyond  what  network  size,  the  asymptotic 
attributes  reasonably  characterize  network  behaviors. 

Protocols  for  Sensor  Networks  (Suda).  Sensor  networks  have  emerged  rapidly  to  provide  surveillance 
functions  in  a  variety  of  applications.  With  sensor  networks,  it  is  possible  to  provide  uninterrupted 
surveillance  of  a  large  area  without  human  intervention.  Our  sensor  network  research  has  focused  on  two 
topics,  network  coverage  and  data  dissemination.  On  the  topic  of  network  coverage,  we  have  investigated 
protocols  that  efficiently  maintain  surveillance  of  either  the  areas  of  interest  or  the  monitoring  targets  of 
interest.  We  have  three  ongoing  projects  in  this  area,  Protocol  for  Coverage-Aware  Sensor  Engagement  in 
Sensor  Networks,  Protocol  for  Monitoring  a  Target  from  Multiple  Viewing  Angles  in  Sensor  Networks,  and 
Protocol  for  Protecting  Targets  in  Sensor  Networks.  On  the  topic  of  data  dissemination,  we  have 
investigated  protocols  that  efficiently  store  sensing  data  in  a  network  and  disseminate  the  sensing  data  to 
users.  We  have  one  project  in  this  area,  Protocol  for  Data  Dissemination  in  Sensor  Networks.  More  details 
are  provided  below. 

Protocol  for  Coverage-Aware  Sensor  Engagement  in  Sensor  Networks  (Suda).  We  have  investigated  a 
new  protocol,  Coverage-Aware  Sensor  Engagement  (CASE),  to  efficiently  provide  required  network 
surveillance  of  the  areas  of  interest.  The  CASE  protocol  achieves  the  goal  by  selectively  activating  sensors 
according  to  the  requirements  of  applications.  To  select  which  sensors  should  be  activated,  each  sensor 
evaluates  its  own  contribution  to  network  surveillance  in  a  distributed  manner.  Based  on  its  contribution  to 
network  surveillance,  each  sensor  determines  if  it  should  stay  active  and  continue  monitoring  or  if  it  should 
deactivate  itself.  Sensors  with  higher  contribution  to  network  surveillance  preferably  remain  active, 
resulting  in  the  need  for  fewer  active  sensors  to  guarantee  enough  surveillance. 

We  have  conducted  extensive  simulations  to  evaluate  the  performance  of  the  proposed  CASE  protocol  in 
terms  of  the  number  of  activated  sensors  and  the  communication  and  computation  costs.  The  performance 
metrics  are  measured  by  varying  sensor  network  density  and  the  surveillance  requirements  of  applications. 
Simulation  results  show  that  CASE,  compared  with  existing  work,  guarantees  the  required  network 
surveillance  by  a  smaller  number  of  active  sensors  while  causing  lower  communication  and  computation 
costs.  In  addition,  we  have  investigated  the  performance  of  CASE  under  the  scenarios  with  location  error 
and  hotspot  areas  that  need  to  be  monitored  with  better  surveillance.  CASE  presents  enough  robustness 
against  location  error  and  exhibits  the  ability  to  support  hotspot  areas. 

Protocol  for  Monitoring  a  Target  from  Multiple  Viewing  Angles  in  Sensor  Networks  (Suda).  We 

have  investigated  a  protocol  that  selectively  activates  a  group  of  sensors  to  form  a  monitoring  structure 
around  a  mobile  target  such  that  activated  sensors  monitor  the  target  from  multiple  viewing  angles.  The 
proposed  protocol  assumes  that  a  large  number  of  sensors  with  varying  capabilities  (e.g.,  different  sensing 
and  communication  capabilities,  and  different  mobility  capabilities)  are  randomly  deployed,  creating  a 
large-scale  and  dynamic  sensor  network  environment.  It  also  assumes  that  sensors  can  take  one  of  the  three 
different  modes  (i.e.,  sleeping  mode,  listening  mode,  and  active  mode).  On  detecting  a  mobile  target,  active 
sensors  broadcast  notifications  to  their  neighboring  sensors  and  activate  neighboring  sensors  in  listening 
mode  in  order  to  form  a  monitoring  structure.  Based  on  the  information  contained  in  the  notifications  (e.g., 
viewing  angles  to  the  target,  distance  to  the  target  etc.),  each  activated  sensor  autonomously  decides 
whether  to  stay  in  active  mode  or  switch  to  sleeping  mode.  This  distributed  and  autonomous  activation 
process  results  in  the  self-organization  of  a  monitoring  structure  that  cannot  be  achieved  by  any  individual 
sensor. 

We  have  considered  different  scenarios  based  on  different  sensor  capabilities  and  have  developed  a  set  of 
distributed  protocols  to  selectively  activate  sensors  and  form  an  effective  monitoring  structure.  We  have 
designed  simulation  models  to  examine  the  set  of  protocols,  and  have  developed  an  emulator  to  investigate 
the  performance  of  the  protocol  running  on  real  hardware.  Simulation  and  emulation  results  confirm  the 
effectiveness  of  the  protocol. 

Protocol  for  Protecting  Targets  in  Sensor  Networks  (Suda).  We  have  investigated  a  protocol  to  monitor 
and  protect  a  mobile  target  in  a  large-scale  and  dynamic  sensor  network  consisting  of  a  large  number  of 
mobile  sensors  with  varying  capabilities.  The  proposed  protocol  enables  mobile  sensors  to  self-organize, 
without  a  central  controller,  forming  a  largest  possible  perimeter  that  surrounds  a  mobile  target  (or  mobile 
targets)  such  that  no  intruder  can  penetrate  the  perimeter  without  being  detected.  In  the  design  of  the 
protocol,  mobile  sensors  and  mobile  targets  are  modeled  as  particles  that  exist  in  the  nature  (such  as 
molecules).  The  concept  of  repelling  and  attracting  forces  that  exist  among  the  particles  in  nature  (such  as 
inter-molecular  forces)  is  introduced  to  determine  the  direction  of  the  movement  of  sensors  to  form  a 
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perimeter.  By  applying  the  inter-molecular  force  on  mobile  sensors,  mobile  sensors  form  a  perimeter 
autonomously  in  a  distributed  manner. 

We  have  designed  a  protocol  that  controls  the  movement  of  mobile  sensors  to  form  an  optimal  perimeter 
around  mobile  targets.  We  have  conducted  extensive  simulations  to  examine  the  scalability  and  robustness 
of  the  protocol.  Simulation  results  have  verified  the  effectiveness  of  the  protocol. 

Protocol  for  Data  Dissemination  in  Sensor  Networks  (Suda)  We  have  investigated  a  new  protocol, 
Rendezvous  Track  protocol,  for  sensors  to  determine  where  to  store  sensing  data  and  for  users  to  determine 
where  to  retrieve  the  sensing  data.  In  the  proposed  protocol,  each  sensor  applies  a  hash  function  to  the 
attributes  of  the  sensing  data,  yielding  the  location  of  a  group  of  sensors  to  store  the  sensing  data.  When 
retrieving  sensing  data,  a  user  applies  the  same  hash  function  to  the  attributes  of  the  desired  sensing  data  to 
obtain  the  location  of  a  group  of  sensors  that  store  the  desired  sensing  data.  This  group  of  sensors  serves  as 
rendezvous  sensors  for  sensing  data  with  some  particular  attributes  so  that  sensing  data  and  user  queries 
containing  the  same  attributes  will  always  converge  at  the  same  set  of  rendezvous  sensors,  leading  to 
successful  sensing  data  retrieval.  The  P.I.  also  developed  a  mechanism  through  which  the  proposed  protocol 
dynamically  adjusts  the  group  of  sensors  to  store  sensing  data  based  on  different  user  query  patterns. 

We  have  designed  the  Rendezvous  Track  protocol  with  dynamic  adaptation  to  user  query  patterns.  We  have 
also  designed  and  implemented  simulation  models  to  investigate  the  scalability,  robustness,  adaptability, 
and  efficiency  of  the  protocol.  Simulation  results  have  verified  the  effectiveness  of  the  proposed  protocol. 

Spoken  Natural  Language  Interface  for  a  Personal  Handheld  Device  (Melliar-Smith  and  Moser). 

Mobile  wireless  devices,  such  as  cell  phones,  can  provide  for  their  users,  potentially,  many  useful  services 
but  they  are  currently  limited  by  inadequate  input  and  output  capabilities,  a  small  screen  and  no  or  unusable 
keyboard  and  mouse.  Karthik  Perumalsamy,  Rama  Alebouyeh,  Michael  Schuricht,  Yi  Xin  Hu,  Shreyas 
Prasad,  Zachary  Davis,  Michael  Melliar-Smith  and  Louise  Moser  have  experimented  with  speech  input  and 
output  to  augment  the  small  screen  and  facilitate  the  provision  of  services  to  the  user. 

For  speech  synthesis,  we  used  the  NaturalVoices  package  developed  by  AT&T  Laboratories,  with  excellent 
results.  Speech  output  is  a  poor  mechanism  for  communicating  complex  information,  such  as  maps,  lists, 
etc.  Consequently,  our  applications  use  the  screen  to  display  such  complex  information.  However,  speech 
output  is  an  excellent  mechanism  for  responding  to  and  confirming  speech  input. 

For  speech  recognition,  we  used  the  DynaSpeak  software  developed  at  SRI  International  for  DARPA.  In 
the  current  state  of  the  art,  it  is  necessary  to  use  a  restricted  vocabulary  and  grammar  that  is  appropriate  to  a 
specific  service  or  application.  It  is  also  necessary  to  train  the  recognizer  for  a  specific  user,  the  owner  of 
the  mobile  device.  This  training  can  achieve  much  better  recognition  than  is  possible  for  call  centers  that 
must  handle  a  wide  range  of  users. 


The  Program  Manager  determines  which  service  to  invok 

We  developed  several  natural  language  speech  input  applications  for  the  mobile  wireless  handheld  device, 
including  a  Contact  Service,  a  Location  Service,  and  an  Appointment  and  Reminder  Service.  These 
services  operate  quite  well,  and  switching  between  the  services  is  reasonably  natural  and  convenient. 
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The  mobile  wireless  device  must  support  multiple  applications  concurrently,  and  some  single  sentences 
involve  multiple  applications.  For  example,  in  the  sentence 
“Is  there  an  Italian  restaurant  near  Susan’s  house?” 

the  phrase  "...  Susan’s  house”  is  handled  by  the  Contact  Service,  while  the  phrase  “Is  there  an  Italian 
restaurant  near  ...”  is  handled  by  the  Location  Service.  Thus,  the  Program  Manager  initiates  the  process  of 
speech  recognition  with  a  scan  for  keywords.  DynaSpeak  then  reparses  the  sentence  using  a  vocabulary  and 
grammar  appropriate  to  the  service  corresponding  to  the  detected  keywords.  If  keywords  are  detected  for 
more  than  one  service,  the  sentence  is  parsed  separately  for  each  of  those  services.  The  grammars  contain 
wild  cards  that  can  match  anything  and,  in  particular,  phrases  that  a  different  service  should  handle.  While 
not  instantaneous,  multiple  parses  of  a  single  sentence  can  be  performed  sufficiently  rapidly  to  achieve 
reasonably  natural  and  timely  response  to  human  speech  input. 

To  develop  and  demonstrate  this  technology,  we  used  a  small  handheld  PC,  manufactured  by  OQO.  This 
device  contains  a  1  GHz  Transmeta  processor  and  512  Mbytes  of  RAM,  and  is  fast  enough  to  run  the 
speech  recognition  software.  The  device  operates  under  the  Windows  operating  system  to  provide  a 
convenient  development  environment.  The  device  has  a  poor  microphone  with  no  noise  cancellation,  which 
makes  the  device  very  sensitive  to  background  noise.  Moreover,  the  device  has  no  loudspeaker  and,  thus, 
we  employ  either  a  Bluetooth  earpiece  or  a  WiFi  link  to  a  laptop  computer  with  a  loudspeaker.  The  WiFi 
link  also  provides  the  Internet  access  that  several  of  the  applications  need. 

4.6  Bio-Networking  Architecture 

The  Bio-Networking  Architecture,  developed  by  Tatsuya  Suda,  is  motivated  by  the  observation  that 
desirable  properties  of  future  networks  (such  as  scalability,  adaptability,  survivability  and  availability)  have 
already  been  realized  in  various  biological  systems.  The  Bio-Networking  Architecture  is  a  framework  for 
developing  large-scale,  highly  distributed,  heterogeneous  and  dynamic  network  applications. 

In  the  Bio-Networking  Architecture,  key  biological  principles  and  mechanisms  are  applied  to  design 
network  applications.  A  network  application  is  implemented  as  a  decentralized  collection  of  autonomous 
objects  called  cyber-entities,  illustrated  in  the  figure  below.  This  structure  is  analogous  to  a  bee  colony  (a 
network  application)  consisting  of  multiple  bees  (cyber-entities).  Each  cyber-entity  implements  a  functional 
component  related  to  its  service  or  application,  and  follows  simple  behavior  rules  (e.g.  replication, 
reproduction  and  migration)  similar  to  biological  entities. 


Cyber-entities  and  network  applications. 

The  key  aspects  of  the  Bio-Networking  Architecture  supported  by  this  AFOSR  MURI  grant  are: 

•  Investigation  of  Dynamic  Service  Composition 

•  Investigation  of  Adaptation  and  Evolution  of  cyber-entities 

•  Design  and  Implementation  of  the  Bio-Networking  Platform. 

Major  accomplishments  in  the  investigation  of  the  Bio-Networking  Architecture  are  summarized  below. 

Dynamic  Service  Composition  (Suda).  Dynamic  service  composition,  i.e.,  composing  an  application 
through  autonomous  interactions  among  multiple  service  components  (i.e.  cyber-entities)  at  runtime,  is  a 
key  feature  of  the  Bio-Networking  Architecture.  To  enable  dynamic  service  composition,  we  have  proposed 
a  new  component  model,  Component  Service  Model  with  Semantics  (CoSMoS).  CoSMoS  integrates  the 
semantic  information  and  the  functional  information  of  a  service  component  into  a  single  semantic  graph 
representation.  We  have  also  developed  a  unified  interface,  Component  Rimtime  Environment  (CoRE),  in 
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order  to  convert  different  implementations  of  service  components  into  the  CoSMoS  representation.  Using 
the  semantic  support  of  CoSMoS,  we  have  developed  a  semantics-based  service  composition  protocol, 
Semantic  Graph  based  Service  Composition  (SeGSeC).  SeGSeC  generates  the  workflow  of  the  application 
requested  by  a  user  and  examines  the  semantics  of  the  workflow  against  the  user’s  request.  See  the  figure 
below. 

Using  CoSMoS,  CoRE  and  SeGSeC,  we  have  implemented  a  service  composition  system  and 
demonstrated  that  the  implemented  service  composition  system  supports  the  semantics-based  dynamic 
service  composition.  We  have  also  evaluated  the  performance  of  SeGSeC  using  the  implemented  system. 

SeGSeC 

(Service  composition  mechanism) 


CoRE 

( middleware  for  CoSMoS) 


CoSMoS 

(Component  representation) 


Various  Component  Technologies 
(e.g.,  SOAP/UDDI,  CORBA,  UPnP,  Jini) 


Architecture  overview. 

Adaptation  and  Evolution  (Suda).  Adaptation  and  evolution  through  natural  selection  are  key  biological 
concepts  that  we  apply  in  the  Bio-Networking  Architecture.  We  have  designed  evolutionary  mechanisms  for 
cyber-entities  and  demonstrated  through  simulations  that  network  applications  implemented  using  the  Bio- 
Networking  Architecture  evolve  and  adapt  to  a  wide  variety  of  network  environments  without  relying  on 
central  control. 

The  research  on  adaptation  and  evolution  in  the  Bio-Networking  Architecture  has  focused  on  the 
development  of  a  generic  framework  where  multiple  types  of  cyber-entities  interact  and  co-evolve  in  the 
network.  We  have  extended  the  simulator  to  accommodate  interactions  among  various  types  of  cyber¬ 
entities.  The  extended  simulator  has  been  used  to  investigate  how  the  cyber-entities  evolve  their  interactions 
through  natural  selection. 

Design  and  Implementation  of  the  Bionet  Platform  (Suda).  The  middleware  platform  in  the  Bio- 
Networking  Architecture,  the  Bionet  platform,  provides  reusable  software  components  for  developing, 
deploying  and  executing  cyber-entities.  The  components  abstract  low-level  operating  and  networking  details 
(e.g.,  I/O,  concurrency,  messaging  and  network  connection  management),  and  implement  high-level  runtime 
services  that  cyber-entities  use  to  perform  their  services.  The  components  in  the  Bionet  platform  are  based 
on  several  biological  concepts  (e.g.,  energy  exchange  and  pheromone  emission).  We  have  designed  and 
implemented  the  Bionet  platform,  and  have  empirically  demonstrated  that  the  Bionet  platform  is  efficient, 
scalable,  reusable,  and  significantly  simplifies  development  of  network  applications. 

We  have  worked  actively  in  the  Object  Management  Group  (OMG)  to  reflect  the  key  designs  of  the  Bio- 
Networking  platform  in  the  OMG  Super  Distributed  Objects  specification.  With  partner  organizations  such 
as  Hitachi,  Ltd.  and  GMD  FOKUS  (a  national  institute  for  computer  science  in  Germany),  we  have 
developed  a  formal  specification  for  Super  Distributed  Objects,  which  has  been  adopted  as  a  standard  by 
the  OMG. 

Protocols  for  Distributed  Discovery  in  Peer-to-Peer  Systems  (Suda).  In  the  Bio-Networking 
Architecture,  a  network  application  is  implemented  as  a  collection  of  distributed  cyber-entities.  Cyber¬ 
entities  are  distributed  over  a  network  and  also  migrate  among  network  nodes.  In  such  an  environment 
where  network  objects  are  distributed  over  a  network  and  dynamically  migrate,  there  is  a  need  for  a 
protocol  that  is  capable  of  locating  network  objects. 
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To  locate  distributed  network  objects,  we  have  developed  two  distinct  discovery  protocols.  In  both 
discovery  protocols,  network  objects  contain  relationships  (links  or  pointers)  to  one  another.  Discovery 
queries  originate  from  a  network  object  and  travel  from  network  object  to  network  object  through 
relationships  in  a  decentralized  manner.  Each  network  object  contains  a  set  of  keywords  that  describe  the 
contents  of  the  network  object.  Relationships  of  network  objects  contain  information  (e.g.,  keywords) 
regarding  the  relationship  partner,  providing  a  mechanism  to  guide  discovery  queries. 

The  first  discovery  protocol  introduces  keyword  similarity  among  network  object  to  organize  them  into 
clusters  as  well  as  relationship  history  to  track  relationship  partner’s  past  discovery  performance.  The 
second  discovery  protocol  utilizes  the  preference  of  discovery  results  (discovery  hits)  by  the  discovery 
originator  (i.e.,  a  user  or  a  network  object),  and  applies  probabilistic  forwarding  based  on  the  preference  of 
discovery  hits  to  improve  the  adaptability  of  discovery  to  dynamic  environments. 

We  conducted  extensive  simulations  to  investigate  the  adaptability  of  the  two  *discovery  protocols  to 
dynamic  and  heterogeneous  network  environments.  In  the  simulated  dynamic  network,  network  objects  may 
move,  affecting  connectivity  of  network  objects  and  potentially  making  some  network  objects  temporarily 
unreachable  and  unavailable.  Moreover,  some  network  objects  (e.g.,  data  such  as  Web  pages  and  files) 
might  continuously  change  their  contents  and  properties  (e.g.,  keywords  describing  their  contents,  and  times 
that  they  are  last  updated).  In  the  simulated  heterogeneous  network,  network  objects  have  heterogeneous 
capabilities  (e.g.,  processing  power,  storage  size)  in  forwarding  discovery  queries  and  users  have 
heterogeneous  requirements  for  discovery  (e.g.,  requiring  fast  discovery,  low  overhead  discovery). 
Simulation  results  have  demonstrated  the  adaptability  of  the  two  discovery  protocols  to  dynamic  and 
heterogeneous  network  environments. 

4.7  Middleware  for  Supporting  Application  Programs  with  End-to-End  QoS 

TAO  Pluggable  Protocols  Framework  (Schmidt):  One  of  the  problems  with  conventional  middleware 
solutions  is  that  they  hard-code  the  distributed  object  programming  model  together  with  the  use  of  general- 
purpose  network  protocols  (such  as  TCP/IP)  that  are  inadequate  to  meet  the  QoS  needs  of  distributed  real¬ 
time  applications.  To  overcome  this  problem,  another  part  of  our  PERC  project  has  focused  on  enhancing 
the  TAO  pluggable  protocols  framework  shown  in  the  following  figure  so  that  developers  can  concentrate 
on  designing  their  application-specific  logic  without  having  to  worry  about  timing,  fault  tolerance,  mobility, 
or  security  issues.  The  key  TAO  pluggable  protocols  framework  components  are  described  briefly  below: 

•  ORB  Messaging  Component:  This  component  is  responsible  for  implementing  ORB  messaging 
protocols,  such  as  the  standard  CORBA  General  InterORB  Protocol  (GIOP)  ORB  messaging 
protocol,  as  well  as  custom  Environmentally-Specific  InterORB  Protocols  (ESIOPs).  An  ORB 
messaging  protocol  defines  a  data  representation,  an  ORB  message  format,  an  ORB  transport 
protocol  or  transport  adapter,  and  an  object  addressing  format.  Within  this  framework,  ORB 
protocol  developers  are  free  to  implement  optimized  Inter-ORB  protocols  and  enhanced  transport 
adaptors,  as  long  as  they  respect  the  standard  CORBA  interfaces. 

•  ORB  Transport  Adapter  Component:  This  component  maps  a  specific  ORB  messaging  protocol, 
such  as  GIOP  or  DCE-CIOP,  onto  a  specific  instance  of  an  underlying  transport  protocol,  such  as 
TCP,  SCTP,  or  ATM.  The  figure  shows  an  example  in  which  TAO's  transport  adapter  maps  the 
GIOP  messaging  protocol  onto  TCP,  the  standard  mapping  called  HOP.  In  this  case,  the  ORB 
transport  adapter  combined  with  TCP  corresponds  to  the  Transport  layer  in  the  Internet  reference 
model.  However,  if  ORBs  communicate  over  an  embedded  interconnect,  such  as  a  VME  bus,  the 
bus  driver  and  DMA  controller  provide  the  Transport  layer  in  the  communication  infrastructure. 

•  ORB  Policy  Control  Component:  We  have  defined  QoS  APIs  that  allow  applications  to  specify 
their  QoS  requirements  using  industry-standard  CORBA  IDL  interfaces.  In  particular,  TAO's 
pluggable  protocols  framework  provides  an  extensible  policy  control  component  that  implements  the 
QoS  framework  defined  in  the  CORBA  Messaging  and  Real-Time  CORBA  specifications.  This 
component  allows  applications  to  control  the  QoS  attributes  of  configured  ORB  transport  protocols. 
Example  policies  for  pluggable  protocols  include  buffer  pre-allocations,  fragmentation,  bandwidth 
reservation,  and  maximum  transport  queue  sizes.  Policies  in  CORBA  can  be  set  at  the  ORB,  thread, 
or  object  level.  Thus,  application  developers  can  set  global  policies  that  take  effect  for  any  request 
issued  in  a  particular  ORB.  Moreover,  these  global  settings  can  be  overridden  on  a  per- thread  basis, 
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a  per-object  basis,  or  even  before  a  particular  request.  CORBA's  Policy  framework  provides  fine¬ 
grained  control  over  the  ORB  behavior  while  providing  simplicity  in  the  common  case. 

QoS-Enabled  Network  Protocols  Supported  by  TAO’s  Pluggable  Protocols  Framework  (Schmidt): 

We  have  implemented  and/or  integrated  a  wide  range  of  network  protocols  into  TAO’s  pluggable  protocols 
framework.  These  protocols  can  be  classified  into  the  following  main  categories: 

•  General-Purpose  Network  Protocols,  such  as  TCP/IP,  SSL  over  TPC/IP,  UDP/IP,  UNIX  domain 
sockets,  and  shared  memory 

•  Embedded  System  IUnterconnects,  such  as  VME,  Fibrechannel,  and  SCRAMNet. 

•  QoS-Enabled  Network  Protocols,  such  as  IntServ,  DiffServ,  and  the  Stream  Control  Transmission 
Protocol  (SCTP). 

Our  accomplishments  include  integrating  DiffServ  and  SCTP  support  into  TAO’s  pluggable  protocols 
framework.  By  adding  DiffServ  capabilities  to  the  TAO  ORB  and  then  mapping  Real  Time  CORBA 
priority  values  to  Diffserv  service  classes,  we  have  provided  a  rich  middleware  platform  for  developing  and 
deploying  QoS-enabled  distributed  real-time  applications  that  can  preserve  task  priorities  via  OS  schedulers 
and  network  prioritization  mechanisms  end-to-end.  Similarly,  the  Signal  Control  Transmission  Protocol 
(SCTP)  provides  a  highly  configurable,  connection-oriented,  message  and  bytestream  transport  service. 
SCTP  exposes  a  large  set  of  parameters  that  can  be  configured  via  TAO’s  QoS  policy  framework  to 
customize  the  connection  performance  to  specific  application  requirements.  Properties  that  can  be 
customized  include  message  ordering  semantics  ( e.g .,  ordered  or  unordered),  reliability  semantics  (e.g., 
retransmit  timeouts  and  max  retransmit  tries),  connection  multiplexing  (e.g.,  number  of  streams),  and 
network  path  multiplexing  (e.g.,  network  interface  set).  SCTP's  customizable  properties  can  be  leveraged  by 
distributed  real-time  CORBA  middleware  and  applications  to  reduce  the  complexity  of  developing  high 
performance,  fault  tolerant  systems. 

ZEN  (Klefstad  and  Schmidt):  We  have  designed  and  developed  ZEN,  which  is  a  freely  available  open 
source  middleware  that  is  compliant  with  most  of  the  features  defined  in  the  CORBA  2.3  specification.  Its 
design  is  based  on  many  of  the  patterns,  techniques,  and  lessons  learned  from  the  development  of  TAO. 
ZEN  is  implemented  in  Java  to  maximum  ease  of  use  and  has  a  micro-kernel  architecture  that  minimizes 
footprint  for  memory  constrained  distributed  real-time  embedded  systems. 

ZEN's  architecture  (see  the  figure  below)  is  based  on  the  layered  pluggability  model,  which  allows  unused 
ORB  components  to  be  factored  out  of  the  ORB  core  to  minimize  middleware  footprint.  An  extensible 
component  based  POA  architecture  allows  customization  based  on  policies,  and  active  de-multiplexing 
optimizations  that  associate  client  requests  with  target  objects  in  constant  time,  regardless  of  the  POA 
hierarchy.  Cross-cutting  concerns  that  cannot  be  captured  within  a  module  or  a  class,  such  as  logging,  are 
implemented  in  ZEN  using  Aspect  Oriented  Programming  techniques.  The  Portable  Interceptor 
specification  has  been  implemented  using  AspectJ. 

RTZen  (Klefstad).  We  have  also  designed  and  developed  RTZen,  the  successor  of  Zen.  RTZen  is 
implemented  using  the  Real-time  Specification  of  Java  (RTSJ)  and  is  compliant  with  most  of  the  features 
defined  in  the  CORBA  2.3  specification  and  major  portions  of  the  Real-Time  CORBA  specification.  Like 
its  predecessor,  it  is  freely  available  and  open  source. 


Unlike  monolithic  object  request  brokers,  ZEN's  pluggable  micro-kernel  architecture  can  be  customized  based  on 
design  preferences,  allowing  greater  flexibility  and  optimization. 

RTZen's  design  is  based  on  newly  discovered  RTSJ  design  patterns,  in  addition  to  patterns,  techniques,  and 
lessons  learned  from  the  development  of  ZEN.  The  key  innovation  behind  RTZen  is  its  memory  hierarchy, 
which  demonstrates  how  the  RTSJ's  scoped  memory  can  provide  highly  predictable  memory  management. 
Besides  providing  the  benefits  of  ZEN,  RTZen  offers  much  higher  predictability  and  throughput  due  to  its 
scoped  memory  hierarchy,  illustrated  in  the  figure  below.  The  ORB  Core  of  RTZen  facilitates  the  use  of 
key  RTSJ  features  (e.g.,  immortal  memory,  scoped  memory,  and  no-heap  real-time  threads)  to  enhance 
middleware  predictability.  Real-time  concurrency  and  dispatching  mechanism  in  RTZen  is  provided  via 
thread-pooling  and  reactive  I/O  mechanisms.  Active  de-multiplexing  optimizations  associates  client 
requests  with  target  objects  in  constant  time,  regardless  of  the  POA  hierarchy.  RTZen  also  uses  intelligent 
component  creation  strategies  to  minimize  the  time  and  space  overhead  for  non-real-time  applications. 

RTZen  runs  on  top  of  both  interpreted  and  ahead-of-time  compiled  RTSJ  platforms,  as  well  as  conventional 
Java  Virtual  Machines.  Currently  it  provides  real-time  predictability  on  the  TimeSys  Linux  operating 
system  compiled  using  TimeSys's  RI,  Purdue's  Open  Virtual  Machine,  and  Washington  University's  jRate. 
Associated  with  RTZen  is  a  tool,  called  IsoLeak,  that  allows  the  visualization  of  hierarchies  scoped  memory 
areas  of  RTSJ  applications  and  locates  memory  leaks. 

ZEN-kit:  Graphical  Customization  of  Real-Time  CORBA  for  Embedded  Systems  (Klefstad). 

Customizing  real-time  CORBA  for  an  application  can  considerably  reduce  the  size  of  the  middleware  and 
improve  its  performance.  However,  customizing  middleware  is  an  error-prone  task  and  requires  deep 
knowledge  of  the  CORBA  standard  as  well  as  the  middleware  design.  The  RT-CORBA  features  of  RTZen 
are  modularized  in  a  hierarchical  form  (see  the  figure  below)  so  that  they  can  be  customized  at  different 
levels  of  granularity.  This  customization  is  achieved  through  modularizing  the  middleware  so  that  features 
may  be  inserted  or  removed  based  on  the  application  requirements. 

ZEN-kit  is  a  graphical  tool  that  embodies  the  principles  of  modular  customization  of  RTZen  for  low- 
footprint  devices.  ZEN-kit  employs  two  mechanisms  for  the  modularization  of  RTZen  components  and 
aspects.  It  provides  a  configuration  strategy  for  the  customization  of  real-time  middleware  to  achieve  low- 
footprint  ORBs  via  ZEN-kit,  a  graphical  tool  for  composing  customized  real-time  middleware. 

Empirical  Evaluation  of  Java-based  CORBA  Component  Model  for  Embedded  and  Real-Time 
Systems  (Klefstad).  Component  technology  can  overcome  many  limitations  of  conventional  Object 
Request  Brokers  (ORBs)  developing  distributed,  real-time,  and  embedded  Distributed  Real-time  Embedded 
(DRE)  applications.  The  CORBA  Component  Model  (CCM)  enables  the  composition  and  reuse  of  software 
components  and  the  configuration  of  key  non-functional  aspects  of  DRE  systems  such  as  timing,  fault- 
tolerance,  and  security.  However,  CCM  can  introduce  an  additional  overhead  to  the  runtime  performance  of 
middleware.  CCM  implementations  also  add  significantly  to  the  code  size  of  an  application.  Hence,  the 
overhead  and  effects  of  using  CCM  need  to  be  evaluated  in  order  to  effectively  employ  it  in  the  design  of 
high-reliability  DRE  applications. 
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We  empirically  evaluated  the  performance  of  OpenCCM,  a  Java-based  implementation  of  the  CCM 
standard,  OpenCCM  was  configured  with  OpenORB,  a  desktop  Java  ORB  and  ZEN,  a  real-time  Java  ORB. 
We  provided  throughput  and  latency  measurements  of  method  invocations,  as  well  as  measurements  for 
both  the  ORBs  configured  with  and  without  OpenCCM.  We  measured  the  performance  overhead  of  using 
OpenCCM  on  each  ORB.  We  also  measured  the  additional  memory  requirement  introduced  by  the  CCM 
implementation  and  provided  an  analysis  of  the  pros  and  cons  of  using  component  technology  in  DRE 
systems 
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A  screen  shot  of  the  graphical  user  interface  of  ZEN-kit. 

Integration  of  Load  Balancing  with  Fault  Tolerance  in  Distributed  Systems  (Moser  and  Melliar- 
Smith).  Existing  CORBA  load  balancing  systems  lack  fault  tolerance  and  CORBA  fault  tolerance 
infrastructures  do  not  provide  load  balancing.  Integration  of  fault  tolerance  with  load  balancing  provides  a 
better  quality  of  service  to  the  users  of  the  applications  by  making  the  infrastructure  and  applications  more 
robust,  available  and  reliable,  and  by  providing  better  scalability,  response  times  and  throughput. 

Aditya  Singh,  Louise  Moser  and  Michael  Melliar-Smith  integrated  TAO's  Load  Balancer  for  CORBA  with 
Eternal  System's  FTORB  fault  tolerance  infrastructure.  TAO's  Load  Balancer  balances  the  load  of  the 
clients  across  multiple  copies  of  the  server  using  a  random  assignment  policy,  while  FTORB  replicates 
CORBA  applications  using  either  active  or  passive  replication  and  thus  makes  them  fault-tolerant.  The  load 
balancer  and  the  applications  are  replicated  so  that  the  applications  can  continue  to  operate  despite  faults. 
Overheads  are  higher  than  for  either  load  balancing  or  fault  tolerance  alone,  but  are  still  reasonable. 

Protocols  and  Analysis  Model  for  QoS  Support  in  Networks  (Suda).  We  have  investigated  various 
network  protocols  dealing  with  QoS  under  different  network  environments,  from  ad-hoc  sensor  networks 
composed  of  capability-limited  small  devices  (i.e.,  sensors)  to  the  Internet  core  comprising  powerful 
backbone  routers.  In  the  sensor  network  QoS  project,  we  have  developed  a  QoS  analysis  model  for  sensor 
networks,  which  analyzes  a  sensor  network  holistically.  In  the  Internet  QoS  project,  we  have  investigated 
mixed  UDP  and  TCP  traffic,  as  well  as  inter-domain  mechanisms.  The  issues  that  we  considered  include 
improving  the  efficiency  of  best-effort  traffic  and  the  management  of  inter-domain  premium  traffic.  Results 
of  the  research  effort  in  each  project  are  described  in  more  detail  below. 

Sensor  Network  QoS  Analysis  Model  (Suda)  In  Wireless  Sensor  Networks  (WSN)  Quality  of  Service 
(QoS)  has  been  isolated  and  focused  on  either  certain  functional  layers  or  certain  application  scenarios.  We 
have  proposed  a  holistic  wireless  sensor  network  QoS  framework  as  the  next  step  in  wireless  sensor 
network  QoS  research.  Using  the  proposed  framework,  application  designers  will  be  able  to  unambiguously 
specify  QoS  requirements,  and  system  engineers  will  be  able  to  holistically  trace  relationships  and  evaluate 
tradeoffs  in  the  design  and  deployment  of  QoS-aware  wireless  sensor  network  applications.  The  tradeoff 
relationships  can  be  used  to  further  understand  how  QoS  requirements  for  wireless  sensor  networks  interact. 
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We  have  approached  the  development  of  this  framework  by  defining  wireless  sensor  network  QoS 
requirements  within  a  WSN  reference  architecture,  and  then  analyzing  how  the  defined  wireless  sensor 
network  QoS  requirements  impact  each  other  within  this  QoS  framework.  We  have  also  proposed  a 
methodology  with  formal  rules  to  identify  such  tradeoffs. 

Router  Queue  Management  for  Mixed  Traffic  (Suda).  We  have  proposed  a  robust  congestion  controller 
against  a  mix  of  unresponsive  and  responsive  traffic.  In  this  project,  we  have  designed  a  new  framework  for 
Internet  routers  with  Active  Queue  Management  (AQM)  algorithms  to  be  more  robust  against  a  mixture  of 
different  traffic  sources  running  different  protocols  (e.g.,  FTP,  HTTP,  UDP,  etc.).  We  are  the  first  to 
introduce  a  wavelet-based  de-noising  technique  to  remove  the  impact  of  unresponsive  traffic  on  the  AQM 
controller.  By  introducing  a  separate  traffic  de-noising  filter,  the  AQM  controller  is  capable  of  removing  the 
impact  of  unresponsive  traffic  on  the  performance  of  the  responsive  traffic  (i.e.,  long-lasting  TCP  traffic). 
This  technique  enables  a  new  paradigm  in  the  design  of  AQM  controller  with  mixed  traffic  in  the  Internet. 

We  have  designed  the  wavelet  de-noising  filter  that  is  used  to  separate  and  remove  unresponsive  traffic 
from  the  responsive  traffic  in  the  Internet  traffic.  We  have  conducted  extensive  simulations  to  examine  the 
efficiency  and  robustness  of  the  proposed  design. 

Inter-domain  QoS  Automation  Using  Economics  (Suda).  We  have  investigated  the  inter-domain 
Resource  Exchange  (iREX)  architecture  for  the  automated  deployment  of  an  end-to-end  inter-domain  QoS 
policy  among  multiple  Internet  Service  Providers.  iREX  uses  economics  and  fully  distributed  mechanisms 
to  self-manage  the  deployment  of  this  policy  while  promoting  congestion-avoidance  by  enabling  a 
distributed  resource  selection  process  that  selects  the  least  congested  inter-domain  deployment  path. 

We  have  designed  the  architecture  and  protocols  for  iREX.  We  have  investigated  the  performance  of  the 
proposed  protocols  through  extensive  simulations.  The  simulation  results  show  that  IREX  protocols 
perform  well  under  different  simulation  configurations. 
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Protocol  classes  supported  in  J-Sim. 


FTP:  file  transfer  protocol 

BSD:  Berkeley  socket  distribution 

RIP:  routing  information  protocol 

DVMRP:  distance  vector  multicast  routing  protocol 

CBT:  core  based  tree  protocol 

TSW:  time  sliding  window 

RED:  random  early  drop 

SRED:  stable  random  early  drop 

RSVP:  Resource  reservation  protocol 

RM:  rate  monotonic 

DCTS:  distance  constrained  task  system 
SCFQ:  self-clocked  fair  queuing 
STFQ:  start  time  fair  queuing 
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4.8  Tools  for  Supporting  Network  Simulation/Emulation 

Simulation  of  Large  Distributed  Application-Level  Protocols  (Baker):  In  our  work  with  ad  hoc 
networks,  as  well  as  other  work  with  peer-to-peer  networks,  we  have  found  a  need  for  high-level  simulation 
tools,  that  is,  tools  that  can  simulate  application-level  protocols  in  large  networks  with  large  numbers  of 
flows  over  long  periods  of  time.  Packet-level  simulation  tools,  such  as  ns-2  and  the  new  JavaSim  (work  in 
this  project),  are  designed  to  provide  a  level  of  detail  that  is  not  geared,  for  instance,  for  simulation  of  a 
month’s  worth  of  application  behavior.  We  have  thus  designed  and  begun  implementation  of  Narses,  a  new 
simulator  targeted  towards  large  distributed  applications.  The  goal  of  Narses  is  to  validate  the  behavior  of 
large  applications  efficiently  using  network  models  of  varying  levels  of  detail.  For  efficiency,  our  lowest- 
level  construct  is  not  a  packet  but  an  application  “flow.”  We  introduce  several  assumptions  that  allow  us  to 
simulate  many  flows  over  many  nodes  over  long  periods  of  time.  One  assumption  is  that  the  bandwidth 
bottlenecks  will  be  the  last-hop  links  and  not  links  internal  to  the  Internet.  These  assumptions  mean  that 
Narses  is  not  appropriate  for  all  topologies,  but  it  is  appropriate  for  all  of  the  topologies  we  have  so  far 
desired  to  simulate,  including  large  Internet  topologies  with  thousands  of  nodes  and  tens  of  thousands  of 
flows.  While  fme-grained  accuracy  of  simulated  runtimes  is  not  the  main  goal  of  Narses,  early  results  with 
our  most  detailed  network  model,  as  compared  to  ns-2,  show  up  to  a  five  times  speed  up  in  simulation  time, 
with  a  53%  decrease  in  memory  consumption  while  maintaining  a  reasonable  degree  of  accuracy  (within 
8%  on  average). 

Design,  Implementation,  and  Module  Enhancement  of  J-Sim  (Hou):  We  have  carried  out  extensive 
research  and  development  tasks  to  realize  an  extensible,  reusable,  and  component-based  network  simulation 
and  emulation  environment,  as  shown  in  the  table  and  described  in  more  detail  below. 

Design  and  Implementation  of  Autonomous  Component  Architecture  (Hou):  We  have  designed  a 
component-based  software  architecture,  called  the  Autonomous  Component  Architecture  (ACA),  that 
deploys  a  message-passing,  independent  execution  model  to  more  closely  mimic  hardware  systems,  in  terms 
of  how  components  are  specified  and  assembled  and  how  components  interact  with  one  another.  To 
specify,  implement,  and  evaluate  ACA,  and  to  explore  its  application  to  building  large-scale,  extensible 
network  simulation/emulation  environments,  we  have  carried  out  the  following  tasks: 

(1)  We  have  implemented  a  proof-of-concept  version  of  ACA  in  Java.  Through  the  lessons  learned  from 
implementation  and  experimentation,  we  have  continued  to  refine/enrich  the  implementation.  For 
example,  we  have  designed  and  implemented  an  execution  context  management  and  scheduling 
mechanism  in  the  ACA  runtime.  We  have  also  fine-tuned  the  performance  of  the  ACA 
implementation  to  reduce  the  component  overhead  in  terms  of  execution  time  and  memory  usage. 

(2)  To  explore  how  ACA  facilitates  the  building  of  large-scale,  extensible,  and  reusable  software 
systems,  we  have  built,  on  top  of  the  ACA  implementation,  a  compositional  network  simulation 
environment,  called  J-Sim.  In  summary,  we  have  devised  a  packet-based  network  modeling 
framework,  called  the  Extensible  Internetworking  Framework  (EIF),  and  have  implemented  on  top 
of  the  ACA  and  EIF  implementation,  a  suite  of  network  protocols  in  the  Internet  best-effort  service, 
integrated  service,  and  differentiated  service  architectures.  The  table  gives  the  set  of  protocol 
classes  currently  supported  in  J-Sim.  We  have  also  extended  J-Sim  to  include  components  in  mobile 
wireless  environments,  i.e.,  antenna  propagation  models,  terrain  models,  IEEE  802.1 1,  power  saving 
mode,  and  ad  hoc  routing  classes  (DSR/AODV). 

Performance  Evaluation  (Hou):  We  have  conducted  extensive  stress  tests,  compared  the  performance  of 
J-Sim  against  ns-2  and  SSFNET,  and  made  a  detailed  qualitative  and  quantitative  comparison  (see 
http://www.javasim.org/comparison.html).  In  spite  of  the  overhead  inherited  from  the  component 
architecture,  J-Sim  demonstrates  better  scalability  (both  in  terms  of  simulation  completion  time  and 
experiment  setup  time)  in  large  simulation  scenarios  (e.g.,  in  the  case  that  the  number  of  nodes  >=  5000)  on 
a  dual-Pentium  III-600  MHz,  256MB  RAM  PC. 

Module  Enhancement  and  Extension  (Hou):  We  have  extended  J-Sim  to  include  components  for  carrying 
out  simulation  for  wireless  sensor  networks.  In  particular,  to  enable  high-fidelity  simulation  that  reflects 
how  Physical  layer  characteristics  can  impact  the  performance,  we  have  designed  and  implemented  detailed 
models  that  characterize  (i)  Physical  layer  characteristics  such  as  signal  propagation,  signal  attenuation  due 
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to  terrains/foliages,  multi-path  fading,  and  signal  interference,  and  (ii)  power  consumption  models  in  the 
CPU,  memory  access,  NIC  processing,  coding/modulation,  and  other  associative  circuitry  (such  as  acoustic 
sound,  seismic,  or  temperature  sensors  and  actuators).  We  are  in  the  process  of  incorporating  real-life 
traces  that  characterize  how  Physical  layer  characteristics  change  with  environmental  effects  (terrain, 
temperature,  obstacles,  number  of  concurrent  communication  activities). 

We  have  also  extended  J-Sim  to  realize  network  emulation.  Specifically,  we  have  developed  a  complete 
Java-compliant  socket  layer  on  which  real  applications  (e.g.,  web/ftp  servers  and  audio/video  applications) 
can  be  readily  ported,  thus  realizing  top-down  network  emulation.  We  have  also  leveraged  the  packet  filter 
facility  to  intercept  real-life  packets  at  the  device  driver  level  and  redirect  them  to  J-Sim,  thus  realizing 
bottom-up  network  emulation. 

Development  of  Fluid-Model-Based  Simulation  Techniques  for  Expediting  Simulation  (Hou):  The 

major  obstacle  in  packet-level  network  simulation  is  the  large  number  of  packets  that  have  to  be  simulated 
in  order  to  produce  accurate  results.  Each  packet  generates  a  number  of  events  (e.g.,  arrival  of  a  packet  at 
the  router,  its  departure,  and  its  queuing,  and  buffer  depletion)  on  the  path  from  the  source  to  the 
destination.  In  wireless  environments  where  high-fidelity  results  can  be  obtained  only  with  simulation  at  the 
signal  level,  the  problem  is  even  worse.  Due  to  the  broadcast  nature  of  a  wireless  channel,  transmission  of  a 
signal  must  be  received  and  processed  by  all  nodes  operating  on  the  same  channel  (and  neighboring 
channels  if  co-channel  interference  is  taken  into  account).  This  characteristic  of  a  wireless  channel  implies 
that  one  signal  transmission  event  will  generate  numerous  signal  receipt  events.  As  the  CPU  time  required 
is  roughly  proportional  to  the  number  of  events  that  have  to  be  processed,  packet-level  simulation  easily 
becomes  computationally  expensive,  if  not  infeasible,  when  the  network  size  and/or  the  amount  of  traffic  is 
extremely  large.  In  our  simulation  study,  we  have  shown  that  to  simulate  a  typical  WiFi  scenario  in  which  n 
nodes  operate  in  a  802.1 1 -operated  wireless  LAN,  each  of  which  sends  CBR  traffic  at  the  rate  of  0.5  Mbps 
(with  packet  size  set  to  25  bytes),  it  takes  17083  seconds  (4.75  hours)  in  real  time  to  carry  out  a  60-second 
simulation  run  in  the  case  of  n=100  nodes. 

To  solve  this  problem,  we  incorporated,  for  the  sake  of  reducing  the  number  of  events,  theoretical  models  in 
the  literature  as  well  as  developed  by  this  PERC  project,  into  the  simulation.  Conceptually,  a  certain  portion 
of  the  network  traffic  (in  the  MAC  layer)  is  modeled  as  a  continuous  fluid  flow,  rather  than  as  discrete 
packet  instances.  A  cluster  of  closely  spaced  packets  is  modeled  as  a  single  fluid  chunk  with  a  fluid  rate. 
An  analytical  model  (in  the  form  of  a  set  of  differential  equations)  is  then  derived  to  characterize  the  system 
evolution.  The  simulation  environment  (with  the  theoretical  models  incorporated)  keeps  track  of  the  fluid 
rate  changes  at  the  traffic  sources  and  router  queues.  Because  a  large  number  of  packets  is  abstracted  as  a 
single  fluid  chunk,  less  computation  is  needed  to  simulate  network  traffic.  We  have  had  success  in  applying 
the  aforementioned  theoretical  model-based  simulation  to  congested  802.11  wireless  networks  and  TCP 
congestion  control. 


5.  Impact 

5.1  Protocols  for  Supporting  Real  Time  Resource  Management 

•  Jennifer  Hou  has  taken  to  a  rigorous  level  the  pioneering  work  of  Tuan  and  Park  on  using  LRD  for 
congestion  control  and  a  simple  heuristic  estimation  scheme,  based  on  conditional  expectations, 
for  exploration  of  the  correlation  structure.  In  her  work,  the  traffic  prediction  is  rigorously  made 
with  the  use  of  a  LMMSE  predictor,  and  the  calculation  of  packet  dropping  probability  in  AQM 
and/or  the  window  adjustment  in  TCP  congestion  control  is  pinpointed  in  the  context  of  steady- 
state  dynamics. 

5.2  Protocols  and  Infrastructure  for  Supporting  Fault  Tolerance 

•  Michael  Melliar-Smith,  Louise  Moser  and  their  Ph.D.  student  Wenbing  Zhao  received  the  best 
paper  award  for  their  work  on  performance  measurement  and  analysis  of  a  multicast  group 
communication  protocol  for  a  fault  tolerance  infrastructure. 
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•  Michael  Melliar-Smith  and  Louise  Moser  participated  in  the  development  of  the  OMG’s  Fault 
Tolerant  CORBA  and  Online  Software  Upgrades  standards. 

•  Michael  Melliar-Smith  and  Louise  Moser  wrote  the  OMG’s  Reliable  Ordered  Multicast  Request 
for  Proposals. 

•  Michael  Melliar-Smith  and  Louise  Moser  participated  in  the  development  of  the  Service 
Availability  Fomm's  Application  Interface  Specification. 

•  Michael  Melliar-Smith  and  Louise  Moser  designed,  and  supervised  the  implementation  and  release 
of,  Eternal  Systems'  Fault  Tolerant  CORBA  product  to  NSWC,  Dahlgren. 

•  Michael  Melliar-Smith  and  Louise  Moser  designed,  and  supervised  the  implementation  and  release 
of,  Eternal  Systems'  Duration  checkpointing  and  TCP  failover  product. 

•  Michael  Melliar-Smith  and  Louise  Moser  invented  technology  and  prepared  patent  applications  on 
Online  Software  Upgrades,  High  Availability,  Fault  Tolerance,  Message  Logging,  Checkpointing 
and  Fault  Recovery,  which  were  transferred  and  assigned  to  Eternal  Systems. 

5.3  Methodology  and  Protocols  for  Supporting  Security 

•  Karl  Levitt  and  his  students  developed  intrusion  detection  methodology  that  is  suitable  for  a 
variety  of  MANET  protocols,  including  AODV,  OLSR  and  DRCP. 

•  Karl  Levitt  and  his  students  developed  a  formal  reasoning  and  verification  methodology  applicable 
to  intrusion  detection  rules  in  the  wired  as  well  as  wireless  MANET  environment. 

•  Karl  Levitt  and  his  students  developed  automatic  response  models  for  application  in  the  wired  as 
well  as  wireless  MANET  environment. 

•  Karl  Levitt  and  his  students  developed  an  intrusion  detection  architecture  suitable  for  bandwidth 
constrained  MANETs. 

•  Karl  Levitt  and  his  students  developed  worm  defense  strategies. 

•  Michael  Melliar-Smith,  Louise  Moser  and  their  Ph.D.  student  Kim  Kihlstrom  received  the  Wilkes 
Award  for  their  work  on  Byzantine  fault  detectors. 

•  Elizabeth  Belding-Royer  and  her  Ph.D.  students  developed  and  released  simulation  code  for 
ARAN,  an  authenticated  routing  protocol  for  ad  hoc  networks. 

•  Elizabeth  Belding-Royer  and  her  Ph.D.  students  developed  and  released  implementation  code  for 
ARAN  that  runs  on  small  handheld  devices. 

•  Elizabeth  Belding-Royer  and  her  Ph.D.  students  developed  and  released  AODVSTAT,  the  first  ad 
hoc  intrusion  detection  system,  both  as  a  simulation  and  implementation  in  a  testbed. 


5.4  Protocols  for  Supporting  Scalable  Real-Time  Multicasts 

•  Michael  Melliar-Smith  and  Louise  Moser  invented  a  low-latency  real-time  multicast  protocol, 
which  their  Ph.D.  student  Wenbing  Zhao  implemented.  The  technology  was  transferred  to  Eternal 
Systems. 

•  The  research  of  Jennifer  Hou  on  QoS-driven  multicast  routing  has  direct  applicability  to  the 
development  and  implementation  of  network  protocols/solutions  in  support  of  resource 
management  in  network-centric  warfare  scenarios.  Jointly  with  Fujitsu  Labs  of  America,  Professor 
Hou  has  presented  the  QoS-driven  multicast  routing  protocol  in  the  Inter-Domain  Multicast 
Routing  Working  Group  of  IETF. 
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5.5  Protocols  for  Supporting  Mobile  Wireless  Environments 

•  J.  J.  Garcia-Luna-Aceves  developed  and  released  the  ROMA  medium  access  control  protocol, 
which  was  used  by  Raytheon  in  demonstrations  to  DARPA  in  the  FCS  program.  ROMA  was  the 
first  MAC  protocol  based  on  transmission  scheduling  applicable  to  directional  antennas. 

•  J.  J.  Garcia-Luna-Aceves  demonstrated  that  AODV  is  not  fail-safe  when  nodes  can  delete  routing 
state,  which  may  lead  to  counting-to-infinity  and  looping,  and  published  several  approaches  to 
solve  AODV’s  limitations. 

•  Jennifer  Hou  invented  the  LMST  algorithm,  the  first  localized  topology  control  algorithm  reported 
in  the  literature.  Professor  Hou  has  worked  with  Lockheed  Martin  to  implement  the  algorithm  in 
their  testbed,  as  part  of  the  DARPA  control-based  MANET  program. 

•  Jennifer  Hou  conducted  one  of  the  first  comprehensive  and  systematic  studies  of  understanding  the 
performance  limits  of  wireless  sensor  networks,  based  on  several  node  and  network  characteristics 
(critical  power,  critical  density,  critical  radius,  network  coverage,  connectivity,  capacity,  and 
lifetime). 

•  Based  on  the  above  derived  critical  conditions,  Jennifer  Hou  and  her  students  designed  localized 
algorithms  and  protocols  that  approach  performance  limits  derived  in  their  asymptotic  study. 

•  Leveraging  on  J-Sim,  Jennifer  Hou  carried  out  a  detailed  and  high-fidelity  simulation  study, 
validating  these  conditions  in  practice.  The  extensions  to  J-Sim  will  be  released  in  as  open  source, 
as  has  been  done  for  the  previous  J-Sim  release  (http://www.j-sim.org). 

•  Michael  Melliar-Smith  and  Louise  Moser  and  their  students  developed  natural  language  speech 
input  applications,  and  demonstrated  effective  speech  recognition  on  an  OQO  mobile  handheld 
device.  This  technology  has  attracted  considerable  interest,  in  particular,  from 

Hossein  Moiin,  VP,  T-Mobile 
Norman  Winarsky,  VP,  SRI  International 
Robert  Scott,  Nokia 

who  and  are  considering  possible  commercialization  of  this  technology. 

5.6  Bio-Networking  Architectures 

•  BBC  News  featured  an  article  that  overviews  the  motivation  and  vision  of  the  Bio-Networking 
Architecture  developed  by  Tatsuya  Suda.  See  http://news.bbc.co.Uk/l/hi/sci/tech/764085.stm)  In 
this  article,  adaptive  behaviors  of  network  applications  in  the  Bio-Networking  Architecture  are 
described. 

•  Key  protocol  designs  in  the  Bio-Networking  Architecture  have  been  adopted  as  a  reference 
architecture  at  the  Super  Distributed  Objects  group  of  the  Object  Management  Group. 


5.7  Middleware  for  Supporting  Application  Programs  with  End-to-End  QoS 

•  Raymond  Klefstad  and  Douglas  Schmidt  developed  and  released  ZEN  and  RTZen,  which 
implements  RT-CORBA  in  real-time  Java. 

•  Raymond  Klefstad  developed  and  refined  the  pluggable  transport  protocol  framework  in  RTZen, 
which  was  later  replaced  by  the  Extensible  Transport  Framework,  which  has  become  a  part  of  the 
OMG  standard. 

•  Raymond  Klefstand  released  RTZen,  the  first  available  open-source  RTSJ  CORBA  middleware 
implementation 

•  RTZen  is  the  largest  RTSJ  software  to  date  and  is  used  as  a  test  program  for  real-time  JVMs,  e.g. 
OVM  by  Purdue  University  and  jRate. 
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•  RTZen,  developed  by  Raymond  Klefstad  and  his  students,  was  evaluated  by  JPL,  NASA,  and 
Raytheon  .  They  presented  RTZen  at  the  “Java  day”  event  at  Raytheon  .  RTZen  was  considered 
for  use  in  the  DD(X) 

•  RTZen,  developed  by  Raymond  Klefstad  and  his  students,  was  used  in  PrismJ,  which  is  a  part  of  a 
capstone  demo  for  the  DARPA  PCES  (Program  Composition  for  Embedded  Systems)  program. 

•  Michael  Melliar-Smith,  Louise  Moser  and  their  Ph.D.  student  Vana  Kalogeraki  participated  in  the 
development  and  release  of  the  OMG  Dynamic  Scheduling  standard. 

5.8  Tools  for  Supporting  Network  Simulation/Emulation 

•  Jennifer  Hou  released  as  open  source  J-Sim,  which  provides  an  extensible,  reusable  network 
simulation  and  emulation  environment.  Since  its  first  release  in  October  2001,  more  than  5000+ 
users  have  downloaded  the  software,  including  NIST,  Oak  Ridge  National  Laboratory,  Fujitsu 
Labs  of  America,  IBM,  CMU,  University  of  Toronto,  Purdue  University,  Dartmouth  College, 
Georgia  Tech,  and  Renesys,  Inc,  many  of  whom  have  returned  for  the  latest  release,  V.  1.3. 
Extensions  to  wireless  networks,  wireless  sensor  networks,  active  networks,  MPLS,  and  security 
mechanisms  have  been  made,  some  of  which  are  provided  by  J-Sim  users  from  Europe,  Asia,  and 
the  States. 


6.  Education  and  Student  Training 

This  MURI  grant  supported  the  following  Postdoctoral,  Graduate  Student  and  Undergraduate  Student 
Researchers: 

University  of  California,  Santa  Barbara,  Professor  Michael  Melliar-Smith,  Professor  Louise  Moser 

•  Ruppert  Koch,  Graduate  Student  Researcher,  Ph.D.,  Fall  2000,  Postdoctoral  Researcher,  2002-2003 

•  Prithviraj  Dasgupta,  Graduate  Student  Researcher,  Ph.D.,  Spring  2001,  Postdoctoral  Research, 
Summer  2001 

•  Wenbing  Zhao,  Graduate  Student  Researcher,  Ph.D.,  Spring  2002,  Postdoctoral  Researcher,  2002- 
2004 

•  Lauren  Tewksbury,  Graduate  Student  Researcher,  Ph.D.,  Spring  2001 

•  Nitya  Narasimhan,  Graduate  Student  Researcher,  Ph.D.,  Spring  2001 

•  Nearchos  Paspalli,  Graduate  Student  Researcher,  M.S.,  Fall  2002 

•  Aditya  Singh,  Graduate  Student  Researcher,  M.S.,  Spring  2003 

•  Xiaoli  Chen,  Graduate  Student  Researcher,  M.S.,  Summer  2003 

•  Rachit  Chawla,  Graduate  Student  Researcher,  M.S.,  Winter  2004 

•  Karthik  Perumalsamy,  Graduate  Student  Researcher,  M.S.,  Summer  2005 

•  Brendan  Sever,  Undergraduate  Student  Researcher,  B.S.,  Fall  2005 

•  Rama  Alebouyeh,  Graduate  Student  Researcher 

•  Michael  Schuricht,  Graduate  Student  Researcher 

•  Yi  Xin  Hu,  Graduate  Student  Researcher 

•  Shreyas  Prasad,  Undergraduate  Student  Researcher 

•  Zachary  Davis,  Undergraduate  Student  Researcher 

University  of  California,  Santa  Barbara,  Professor  Elizabeth  Belding-Royer,  Associate  Professor 

•  Sumit  Gwalani,  Graduate  Student  Researcher,  M.S.,  September  2003 

•  Swaminathan  Sundaramurthy,  Graduate  Student  Researcher,  M.S.,  September  2003 

•  Kavitha  Srinivasan,  Graduate  Student  Researcher,  M.S.,  December  2004 

•  Kimaya  Sanzgiri,  Graduate  Student  Researcher, 

•  Irfan  Sheriff,  Graduate  Student  Researcher. 
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University  of  California,  Irvine,  Professor  Tatsuya  Suda 

•  Dr.  Lubomir  Bic,  Faculty  Member 

•  Dr.  Tadashi  Nakano,  Postdoctoral  Researcher 

•  Dr.  Jun  Suzuki,  Postdoctoral  Researcher 

•  Ryota  Egashira,  Graduate  Student  Researcher 

•  Akihiro  Enomoto,  Graduate  Student  Researcher 

•  Keita  Fujii,  Graduate  Student  Researcher 

•  Yan  Huang,  Graduate  Student  Researcher 

•  Jim  Lu,  Graduate  Student  Researcher 

•  Michael  Moore,  Graduate  Student  Researcher 

•  Ariffin  B.  Yahaya,  Graduate  Student  Researcher 

•  Mei  Yang,  Graduate  Student  Researcher 

•  Takashi  Hashimoto,  Visiting  Researcher 

•  Dr.  Naoto  Miyoshi,  Visiting  Researcher 

University  of  California,  Irvine,  Professor  Raymond  Klefstad 

•  Juan  Colmenares,  Graduate  Student  Researcher 

•  Shruti  Gorappa,  Graduate  Student  Researcher 

•  Trevor  Harmon,  Graduate  Student  Researcher,  M.S.  June  2005 

•  Jie  Hu,  Graduate  Student  Researcher 

•  Hojjat  Jafarpour,  Graduate  Student  Researcher 

•  Jinhwan  Lee,  Graduate  Student  Researcher 

•  Mark  Panahi,  Graduate  Student  Researcher,  M.S.  December  2004 

•  Krishna  Raman,  Graduate  Student  Researcher,  M.S. 

•  Gunar  Schimer,  Graduate  Student  Researcher,  M.S. 

•  Chia-Yen  Shih,  Graduate  Student  Researcher 

•  Yue  Zhang,  Graduate  Student  Researcher 

University  of  California,  Santa  Cruz,  Professor  J,  J,  Garcia-Luna-Aceves 

•  Lichun  (Luke)  Bao,  Ph.D.,  December  2002 

•  Soumya  Roy,  Ph.D.,  June  2003 

•  Marc  Mosko,  Ph.D.,  May  2004 

•  Yu  Wang,  Ph.D.,  May  2004 

•  Marco  Aurelio  Spohn,  Ph.D.,  July  2005 

•  Ravindra  Vaishampayan,  Ph.D.  Expected  2006 

•  Hari  Rangarajan,  Ph.D.  Expected  2006 


University  of  California,  Davis,  Professor  Karl.  N.  Levitt 

•  Poomima  Balasubramanyam,  Postdoctoral  Researcher 

•  Jeff  Rowe,  Postdoctoral  Researcher 

•  Akshay  Aggarwal,  Graduate  Student  Researcher,  M.S.  2003 

•  Marcus  Tylutki,  Graduate  Student  Researcher,  Ph.D.  August  2005 

•  Tao  Tsong,  Graduate  Student  Researcher  ,  Ph.D.  expected  Spring  2006 

•  Henry  Tseng,  Graduate  Student  Researcher 

•  Angelene  Wang,  Graduate  Student  Researcher 

•  Ivan  Balepin,  Graduate  Student  Researcher 

•  Allen  Ting,  Graduate  Student  Researcher 


University  of  Illinois  at  Urbana  Champaign,  Professor  Jennifer  Hou 

•  Hung-Ying  Tyan,  Graduate  Student  Researcher,  Ohio  State  University,  Ph.D.  Fall  2001 

•  Yuan  Gao,  Graduate  Student  Researcher,  Ohio  State  University,  Ph.D.  September  2002 

•  Guanghui  He,  Graduate  Student  Researcher,  University  of  Illinois  at  Urbana  Champaign,  Ph.D. 
August  2004 

•  Ning  Li,  Graduate  Student  Researcher,  University  of  Illinois  at  Urbana  Champaign,  Ph.D.  August 
2005 

•  Honghai  Zhang,  Graduate  Student  Researcher,  Student,  University  of  Illinois  at  Urbana  Champaign, 
Ph.D.  August  2005 


7.  Publications  During  the  Period  of  the  Grant 

A.  Journal  Papers 

A.  Shum,  P.  M.  Melliar-Smith  and  L.  E.  Moser,  “Design  and  evaluation  of  the  Fibonacci  optical  ATM 
switch,”  IEEE  Transactions  on  Computers,  vol.  50,  no.  5  (May  2001),  466-481. 

L.  E.  Moser,  P.  M.  Melliar-Smith,  P.  Narasimhan,  R.  R.  Koch  and  K.  Berket,  “A  multicast  group 
communication  protocol,  engine  and  bridge  for  CORBA,”  Concurrency  and  Computation:  Practice  and 
Experience,  volume  13,  issue  7  (June  2001),  579-603. 
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tolerant  CORBA  applications,”  Computer  System  Science  and  Engineering  Journal  17,  2  (March  2002), 
103-114. 

K.  P.  Kihlstrom,  L.  E.  Moser  and  P.  M.  Melliar-Smith,  “Unreliable  Byzantine  fault  detectors  for  solving 
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MURI  project  and  of  prior  DARPA  projects  at  UCSB.  Specific  technology  transferred  to  Eternal  Systems 
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has  included  Fault  Tolerant  CORBA  technology,  consistent  checkpointing  mechanisms,  TCP  connection 
failover  technology,  and  low-latency  real-time  multicast  and  group  membership  protocols.  Patents  related 
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including  Intel,  Sun,  IBM,  HP,  Nokia,  Motorola,  and  Fujitsu-Siemens  that  is  developing  standards  for  high 
availability  and  fault  tolerance  middleware  for  telecommunications  applications.  The  Application  Interface 
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Networking  platform  in  the  OMG  Super  Distributed  Objects  specification.  A  detailed  list  of  artifacts  and 
software  is  included  below. 

Jennifer  Hou  has  released  JavaSim  as  freely  available  open  source  at  http://www.javasim.org. 
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•  Fault  Tolerance  for  Computer  Programs  that  Operate  over  a  Communication  Network; 
20020099973,  filed  October  26,  2001;  6,922,792  granted  July  26,  2005. 

•  Consistent  Message  Ordering  for  Semi -Active  and  Passive  Replication;  20040103342,  filed  July 
29,  2003;  6,928,577  granted  August  9,  2005 

•  Transparent  Consistent  Active  Replication  of  Multithreaded  Application  Programs;  20040078617, 
filed  March  24,  2003. 

•  Transparent  Consistent  Semi-Active  and  Passive  Replication  of  Multithreaded  Application 
Programs;  20040078618,  filed  March  25,  2003. 

•  Transparent  TCP  Connection  Failover;  20040268175,  filed  June  11,  2003. 

•  Consistent  Asynchronous  Checkpointing  of  Multithreaded  Application  Programs  Based  on  Semi¬ 
active  or  Passive  Replication;  20050034014,  filed  August  30,  2003. 

•  Consistent  Asynchronous  Checkpointing  of  Multithreaded  Application  Programs  Based  on  Active 
Replication,  filed  August  30,  2003. 

•  Consistent  Time  Service  for  Fault-Tolerant  Distributed  Systems;  20040205372,  filed  January  2, 
2004. 

These  patent  applications  and  patents  have  been  transferred  and  assigned  to  Eternal  Systems. 

Tatsuya  Suda  has  had  granted  one  patent  and  has  filed  another  patent  application,  namely: 

•  A  Framework  for  Creating  Adaptive  Network  Services;  2002-175405,  filed  in  December  2000, 
P3708432,  granted  in  October  2005. 

•  A  Framework  for  Information  and  Service  Distribution;  2005-103154,  filed  in  March  2005. 
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12.  Software  Releases 

Using  technology  developed  with  support  in  part  from  this  MURI  project,  Michael  Melliar-Smith  and 

Louise  Moser  helped  Eternal  Systems  deliver  its  high  availability  and  fault  tolerance  software  products  to 

multiple  customers,  including  NSWC,  Dahlgren: 

•  FTORB  Fault  Tolerant  CORBA  product 

•  Duration  High  Availability  Checkpoint/Restore  product 

Elizabeth  Belding-Royer  and  her  students  released  the  following  software  package: 

•  ARAN,  Version  0. 1 ,  May  1 ,  2004 

Raymond  Klefstad  and  Douglas  Schmidt  and  their  students  released  the  following  software  packages: 

•  ZEN,  Version  01.00.01,  September  19,  2003 

•  RTZen,  Snapshot  588  ,  August  5,  2004 

•  ZEN-kit,  Version  0.1,  May  1,  2005 

Tatsuya  Suda  and  his  students  have  released  the  following  artifacts  and  software: 

•  Software  for  the  Bio-Networking  Architecture 

•  Dynamic  Service  Composition  software,  http://netresearch.ics.uci.edu/kfujii/dsc/ 

•  Adaptation  and  evolution  simulator, 
http://netresearch.ics.uci.edu/bionet/resources/evolution_simulator/index.html 

•  Bio-Networking  platform  software,  http://netresearch.ics.uci.edu/bionet/resources/platform/ 

•  Software  for  Distributed  Discovery  in  Peer-to-Peer  Systems 

•  Similarity-based  Peer-to-Peer  discovery  protocol  simulator, 
http://netresearch.ics.uci.edu/mikemo/ 

•  Preference-based  Peer-to-Peer  discovery  protocol  simulator, 
http://netresearch.ics.uci.edu/bionet/resources/discovery/ryota/discovery_simulator.zip 

•  Software  for  Protocols  for  Sensor  Networks 

•  Protocol  for  Coverage-Aware  Sensor  Engagement  in  Sensor  Networks, 
http://bolero.ics.uci.edu/lujun/simulators/CASE-ns2.tar.gz 

•  Protocol  for  Monitoring  a  Target  from  Multiple  Viewing  Angles  in  Sensor  Networks, 
http://bolero.ics.uci.edu/jidongw/simulators/ 

•  Protocol  for  Protecting  Targets  in  Sensor  Networks, 
http://www.ics.uci.edu/~meiy/mobilesensor.html 

•  Protocol  for  Data  Dissemination  in  Sensor  Networks, 
http://bolero.ics.uci.edu/lujun/simulators/LWDD-ns2.tar.gz 

•  Software  for  Protocols  for  QoS 

•  Router  Queue  Management  for  Mixed  Traffic, 
http://bolero.ics.uci.edu/ypan/simulators/RQM.tar.gz 

•  Inter-domain  QoS  Automation  using  Economics, 
http://bolero.ics.uci.edu/ariffin/simulators/iREX.tar.gz 
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13.  Honors 


The  paper,  “End-to-end  latency  analysis  and  evaluation  of  a  fault-tolerant  CORBA  infrastructure,”  by  W. 
Zhao,  L.  E.  Moser  and  P.M.  Melliar-Smith,  received  the  best  paper  award  at  the  International  Symposium 
on  Performance  Evaluation  of  Computer  and  Telecommunication  Systems,  2002. 

The  paper,  “Unreliable  Byzantine  fault  detectors  for  solving  consensus”  by  K.  P.  Kihlstrom,  L.  E.  Moser 
and  P.  M.  Melliar-Smith,  received  the  Wilkes  Award  from  the  British  Computer  Society's  Computer 
Journal,  2003. 

The  paper,  “Design  and  performance  of  a  dynamically  configurable,  messaging  protocols  framework  for 
real-time  CORBA,”  by  R.  Klefstad,  S.  Rao  and  D.  C.  Schmidt,  received  the  best  paper  award  in  the 
Distributed  Object  and  Component-based  Software  Systems  part  of  the  Software  Technology  Track  at  the 
36th  Annual  Hawaii  International  Conference  on  System  Sciences,  2003. 

The  paper,  “Service  emergence  based  on  relationship  among  self-organizing  entities,”  by  T.  Itao,  T. 
Nakamura,  M.  Matsuo,  T.  Suda  and  T.  Aoyama  won  the  best  paper  award  at  the  IEEE/IPSJ  Symposium  on 
Applications  and  the  Internet,  2002. 

The  paper,  “A  method  of  service  composition  using  service  attributes,”  by  M.  Imada,  Y.  Katayama,  M. 
Matsuo  and  T.  Suda  received  the  best  paper  award  at  the  IEICEJ  Assurance  Symposium,  2001. 

The  paper,  “Coverage  aware  sensor  engagement  in  dense  sensor  networks,”  by  J.  Lu,  L.  Bao  and  T.  Suda, 
won  the  best  paper  award  at  the  IFIP  International  Conference  on  Embedded  and  Ubiquitous  Computing, 
2005. 

Louise  Moser  received  the  Service  Availability  Forum  Service  Award,  for  her  standards  work  as  editor  of 
the  Application  Interface  Specification  in  2003. 

Elizabeth  Belding-Royer  was  recognized  as  one  of  the  Top  100  World  Young  Investigators  by  MIT 
Technology  Review  in  2002. 

Elizabeth  Belding-Royer  received  an  NSF  Career  Award  on  Mobile  Network  Support  for  Collaborative 
Environments  in  2004. 

J.  J.  Garcia-Luna-Aceves  was  elected  IEEE  Fellow  in  2005  “for  contributions  to  theory  and  design  of 
communication  protocols  for  network  routing  and  channel  access.” 

J.  J.  Garcia-Luna-Aceves  received  the  Jack  Baskin  Chair  of  Computer  Engineering,  July  1,  2003  -  June  30, 
2008. 

Jennifer  Hou  received  the  Lumley  Research  Award  at  Ohio  State  University  in  2001. 

Jennifer  Hou  received  an  ACM  Service  Award,  in  appreciation  for  her  contributions  to  ACM  as  program 
chair  for  the  Third  International  Symposium  on  Information  Processing  in  Sensor  Networks  in  2004. 
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